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ABSTRACT 


Nelson,  Barry  Lee.  Ph.D. ,  Purdue  University,  December  1983*  Variance 
Reduction  in  Simulation  Experiments:  A  Mathematical  -  Statistical 
Framework.  Major  Professor:  Dr.  Bruce  V.  Schmeiser. 

With  the  expanding  use  of  computer  simulation  to  model  and  solve 
industrial  engineering  problems,  there  has  been  increasing  interest  in 
the  development  of  efficient  simulation  techniques.  When  the  concern  is 
for  statistical  efficiency  of  results  that  are  random  variables,  such 
approaches  are  usually  called  variance  reduction  techniques  (VRTs). 

-  Many  of  the  fundamental  ideas  in  simulation,  and  particularly 
techniques  for  efficient  simulation,  had  their  origins  in  the  Monte 
Carlo  estimation  literature.  The  theory  of  sampling  is  another  closely 
related  field  that  predates  the  development  of  simulation.  Although 
there  has  been  significant  research  interest  in  variance  reduction, 
there  have  been  few  attempts  to  structure  and  define  the  discipline. 

VRTs  are  transformations.  They  transform  simulation  experiments 
into  related  experiments  that  yield  better  estimates  of  some  parameters 
of  interest,  where  better  usually  means  more  precise.  This  research 
identifies  and  defines  the  components  from  which  all  variance  reduction 
techniques  are  built.  Given  a  general  mathematical-statistical 
definition  of  simulation  experiments,  these  components  or  classes  of 
transformations  are  shown  to  be  useful,  to  be  mutually  exclusive,  and  to 


X 


generate  all  possible  VRTs  via  composition.  Benefits  of  the  research 
include:  1)  the  facility  to  unambiguously  define  new  or  existing  VRTs, 
eliminating  confusion  that  currently  exists  in  the  literature,  2)  the 
facility  to  decompose  VRTs  into  combinations  of  transformations,  making 
the  relationships  between  VRTs  clear,  3)  the  development  of  a 
theoretical  foundation  for  analytical  treatment  of  VRTs,  and  4)  the 
development  of  a  setting  for  proposing  new  VRTs  and  research  questions. 
In  addition,  increased  understanding  of  the  area  should  -  promote  more  and 
better  application  of  variance  reduction  in  practice. 


INTRODUCTION 


With  the  expanding  use  of  computer  simulation  to  model  and  solve 
industrial  engineering  problems,  there  has  been  increasing  interest  in 
the  development  of  efficient  simulation  techniques.  By  efficient 
techniques  is  meant  approaches  that  produce  accurate  answers  with 
reasonable  computing  cost  and  analyst  effort.  Vhen  the  concern  is  for 
statistical  efficiency  of  results  that  are  random  variables,  such 
approaches  are  usually  called  variance  reduction  techniques  (VRTs). 

Definitions 

In  this  research,  simulation  will  refer  to  digital  computer  models 
of  stochastic  systems.  Often  these  models  are  characterized  by  explicit 
accounting  of  the  passage  of  time,  although  this  is  not  a  requirement. 
"By  simulation  is  meant  the  technique  of  setting  up  a  stochastic  model 
of  a  real  situation,  and  then  performing  sampling  experiments  upon  the 
model."  (Harling,  1958)  The  experiment  is  done  to  obtain  performance 
measures  for  the  system,  but  since  the  models  are  driven  by  stochastic 
inputs,  the  measures  are  only  estimates  of  the  true  performance  of  the 
system. 

Many  of  the  fundamental  ideas  in  simulation,  and  particularly 
techniques  for  efficient  simulation,  had  their  origins  in  the  Monte 
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events.'"  (Carter  and  Ignall,  1975,  p.  608) 


"The  idea  [of  virtual  measures  for  a  particular  inventory  problem]  is 
surely  suggestive  of  the  splitting  and  enrichment  techniques."  (Carter 
and  Ignall,  1975,  p.  614) 


"In  the  slab  problem  considered  above,  the  process,  of  splitting, 
accompanied  by  Russian  Roulette,  may  be  thought  of  as  an  example  of 
importance  sampling  where  the  transport  kernel  is  modified."  (Carter 
and  Cashwell,  1975,  p.  17) 


“Latin  Hypercube  Sampling  is  a  variant  of  stratification  especially 
appropriate  for  multivariate  problems  with  restricted  sampling  budgets." 
(Swain,  1981,  p.  40) 


"It  [Latin  Hypercube  Sampling]  is  an  extension  of  quota  sampling,  and  it 
is  a  first  cousin  to  the  'random  balance'  design... and  to  the  highly 
fractionalized  factorial  designs...  and  to  lattice  sampling."  (McKay, 
Beckman  and  Conover,  1979,  p.  243) 


"It  is  interesting  to  note  that  the  exponential  transform  is  a  form  of 
quota  sampling."  (Kahn,  1950b,  p.  62) 
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"Another  instance  of  control  variate  sampling  is  the  use  of  the  same 
sequence  of  random  numbers  in  two  different  phases  of  the  simulation. 
The  idea  is  to  introduce  a  positive  correlation  for  quantities  that  are 
to  be  subtracted."  (Gentle,  1975,  p.  7) 

"The  idea  behind  stratified  sampling  is  essentially  the  same  as  that  of 
importance  sampling."  (Gentle,  1975,  p.  7) 

"This  last  technique  [stratified  sampling J  is  sort  of  a  combination  of 
Importance  Sampling  and  Systematic  Sampling."  (Kahn,  1956,  p.  155) 

"Indeed  the  family  of  [antithetic  variate]  sampling  plans... is  the 
family  of  systematic  sampling  plans."  (Roach  and  Wright,  1974,  p.  8) 

"Systematic  sampling  plans  form  a  computationally  feasible  subset  of  the 
family  of  antithetic  sampling  plans  originally  described  by  Hammersley, 
Handscomb,  and  Mauldin."  (Roach  and  Wright,  1974,  p.  32) 

"The  method  of  stratified  sampling  is  closely  related  to  that  of  using 
'control  variables.'"  (Hartley,  1977,  p.  23) 

"It  is  possible  to  view  virtual  measures  as,  in  the  words  of  a  referee, 
'a  re- packaging  of  conditional  Monte  Carlo  for  the  estimation  of  rare 


.  -V.--V.V_V 
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Biased  Estimators  (Rubinstein,  1981) 

Weighted  Uniform  Sampling  (Powell  and  Swann,  1966) 


Random  Quadrature  Method  (Rubinstein,  1981) 

Use  of  Orthonormal  Functions  (Hammersley  and  Handscomb,  1 964 ) 

Many  researchers  have  conjectured  that  relationships  ezist  between 
various  VRTs.  Most  of  these  conjectures  are  in  fact  true,  but  can  seem 
contradictory  without  a  unifying  theory  that  makes  relationships  and 
differences  apparent.  Below  are  several  examples  of  statements  that  have 
appeared  in  the  variance  reduction  literature. 

"The  antithetic  variate  technique  is  a  particular  case  of  this  situation 
(regression  method)"  (Hammersley  and  Handscomb,  -1964,  p.  66) 

"...so  the  antithetic  variate  method  is  equivalent  to  using  the  control 
variate  t'«l/2t  -  1/2t",  whose  expectation  is  zero."  (Hammersley  and 
Morton,  1956,  p.  449) 

Control  variates  can  be  "a  special  form  of  the  use  of  the  same  [common] 
random  numbers..."  (Kleijnen,  1974,  .p.  205) 

Common  random  numbers  is  a  special  case  of  control  variates 
(paraphrased).  (Gentle,  1975,  p.  8) 
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Selective  Sampling  (Brenner,  1963) 

Fixed  Sequence  Principle  (Ehrenfeld  and  Ben-Tuvia,  1962) 


Sequential  Sampling  (McGrath  and  Irving,  1973a) 


Poatstratified  Sampling  (Wilson,  1979b) 
Stratification  after  Sampling  (Kleijnen,  1974) 


Importance  Sampling 

Partition  of  Region  (Rubinstein,  1981) 

Correction  Sampling  (Hartley,  1977) 

Multi-Stage  Sampling  (Marshall,  1936) 

Method  of  the  Essential  Sample  (Kohlas,  1982) 

Sampling  with  Probability  Proportional  to  Size  (Moy,  1965) 


Transformations  (McGrath  and  Irving,  1973a) 


Expected  Values  (McGrath  and  Irving,  1973a) 
Conditional  Expectations  (Law  and  Kelton,  1982) 
Conditioned  Sampling  (Garman,  1972) 

Statistical  Estimation  (McGrath  and  Irving,  1973a) 
Virtual  Measures  (Carter  and  Ignall,  1975) 

Prior  Information  (Pritsker  and  Pegden,  1979) 
Reducing  the  Dimensionality  (Rubinstein,  1981) 
Strict  Conditional  Monte  Carlo  (Fox,  1983) 

Extended  Conditional  Monte  Carlo  (Fox,  1983) 


Indirect  Estimation  (Law  and  Kelton,  1982) 


Adjoint  Formulations  (McGrath  and  Irving,  1973a) 
Mathematical  Analog  (Kahn,  1950) 


Conditional  Monte  Carlo  (Hammersley  and  Handscomb,  1964) 
Conditional  Sampling  (Hartley,  1977) 

Classic  Conditional  Monte  Carlo  (Fox,  1983) 
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Correlation  Induction  Strategies  (Schruben,  1979) 


Control  Variables 
Control  Variates 

Control  Variate  Sampling  (Swain,  1981) 

Concomitant  Control  Variables  (Lavenberg  and  Welch,  1 981 ) 
Internal  Controls  (iglehart,  1979) 

Concomitant  Information  (Ehrenfeld  and  Ben-Tuvia,  1962) 
Regression  Sampling  (Kleijnen, *1974) 

Extraction  of  the  Regular  Part  (Shreider,  1966) 

Comparison  Method  (Kohlas,  1982) 


Regression  Methods  (Hammersley  and  Handscomb,  1964) 
Regression  on  Concomitant  Variables  (Gentle,  1975) 


Common  Random  Numbers 

Correlated  Sampling  (Law  and  Kelton,  1982) 

Using  the  same  random  numbers 

Correlation  of  Samples  (Kahn  and  Marshall,  1955) 


History  Reanalysis ■ (McGrath  and  Irving,  1973a) 


Systematic  Sampling 
Simple  Stratified  Sampling 

Dagger  Sampling  (Kumamoto,  Tanaka  and  Inoue,  1980a) 
Sequential  Destruction  Method  (Easton  and  Wong,  1980) 
Systematic  Source  Sampling  (Carter  and  Cashwell,  1975) 

Quasi-random  Numbers  (Hammersley  and  Handscomb,  1964) 

Latin  Hypercube  Sampling  (McKay,  Beckman  and  Conover,  1979) 


Stratified  Sampling 
Stratification  of  Random  Numbers 
Quota  Sampling  (Kahn,  1954) 

Adaptive  Stratified  Sampling 

Critical  Value  Stratified  Sampling  (Surkis,  Gordon  and  Hynes,  1975) 
Representative  Sampling  (Delanius,  1950) 

Bowley-sampling  (Delanius,  1950) 

Neyman-sampling  (Delanius,  1950) 

Proportional  Sampling  (Ehrenfeld  and  Ben-Tuvia,  1962) 

Group  Sampling  (Shreider,  1966) 


v  V  V  .*•  A  ^  .*♦  A  A  , 


any  knowledge,  either  known  with  certainty  or  suspected,  beyond  what  is 
needed  to  draw  samples  from  the  experiment.  A  transformation  is  a 
modification  of  a  problem  situation  so  that  a  variance  reduction  might 
be  achieved.  Combined  with  the  necessary  prior  knowledge, 
transformations  produce  VRTs.  In  a  real  application,  VRTs  are 
implemented  as  algorithms.  The  theoretical  framework  developed  here 
defines  six  classes  of  transformations  and  shows  how  they  are  composed 
into  VRTs.  The  format  used  to  define  VRTs  should  be  illustrative  for 
developing  algorithms. 


Existing  Variance  Reduction  Techniques 

The  number  of  VRTs  and  their  variations  is  staggering.  The 
following  is  a  list  of  YRTs  found  in  the  simulation,  Monte  Carlo,  and 
sampling  literature.  Multiple  names  for  the  same  or  very  similar 
techniques  are  grouped  together,  and  references  are  given  for  names  not 


in  common  use. 


Antithetic  Variates 
Antithetic  Sampling 

Antithetic  Transformation  (Halton,  1979) 

Antithetic  Variate  Sampling  Plans  (Roach  and  Wright,  1977) 
Supplemental  Variables  (Mize  and  Cox,  1968) 

Antithetic  Control  Variables  (Cheng,  1981) 

Complementary  Random  Numbers  (Hiller  and  Lieberman,  1974) 
Complementary  Antithetic  Variates  (George,  1977) 
Correlation  Selection  (Ermakov  and  Zolotukhin,  I960) 

Use  of  Dependent  Variables  (Shreider,  1966) 

Symmetrization  of  the  Integral  (Shreider,  1966) 

Basic  Antithetic  Variates  (Roach  and  Wright,  1977) 
Antithetical  Variables  (Kohlas,  1982) 

Compensation  Methods  (Kohlas,  1982) 

Randomization  Sampling  (Deutsch  and  Schmeiser,  1980) 
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it  ia  possible  (conceptually)  to  represent  simulation  experiments 
similarly  by  thinking  of  the  simulation  as  some  input  distributions  and 
output  transformations  from  which  one  can  sample.  It  may  not  be 
possible  to  write  an  explicit  expression  for  the  integral  in  all  cases. 

The  terms  crude  Monte  Carlo  or  crude  sampling  are  used  to  describe 
the  following  technique  for  estimating  (2.1) 


0.  Formulate  (2.1)  as  in  (2.2) 

1.  Sample  n  values  of  X,  (X^ . . . ,Xq) 

2.  Estimate  e  by 


Z 


1_ 

n 


S  «2Ui) 
i-1  * 


Z  is  an  unbiased  estimator  of  0  with  variance 

Var[z]  - 

where 

o2  -  E[g2(x)-e]2 

Clearly  the  variance  of  Z  can  be  reduced  by  increasing  n. 

VRTs  usually  attempt  to  attain  an  estimator  with  smaller  variance 
for  n  observations,  or  the  same  variance  with  fewer  observations.  It  is 
generally  agreed  that  prior  knowledge  is  required  to  achieve  a  variance 
reduction.  For  the  purposes  of  this  research  prior  knowledge  will  mean 
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LITERATURE  REVIEW 

* 

This  chapter  reviews  the  names  of  and  relationships  between  VRTs  as 
veil  as  several  key  survey  papers.  The  intention  is  not  to  explain 
individual  VRTs  in  detail  here;  see  Chapter  5. 

Definitions 

Simulation,  Monte  Carlo  estimation,  and  sampling  are  defined  as  in 
Chapter  1 .  Recall  that  the  value  of  any  integral  can  be  expressed  as 
the  expected  value  of  a  random  variable.  For  example,  consider  the 
simple  scalar  Integral 

0  -  J  g.(x)dx  (2.1) 

A  1 

where  g^(x)  is  a  real  valued  function  of  elements  in  A,  a  subset  of  the 
real  line.  Nov  if  f(x)  is  a  probability  density  function  on  A,  and 
f(x)  ■  0  only  if  g^(x)  -  0,  then 


and  EtggCX)]  »  0,  where  X  is  a  random  variable  with  density  function 
f(x).  Monte  Carlo  estimation  problems  are  formulated  as  integrals,  and 
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concerning  the  definitions  and  relationships  between  VRTs  is  presented. 
Chapter  3  develops  a  general  mathematical-statistical  definition  of 
simulation  experiments,  which  is  necessary  to  define  the  classes  of 
transformations  and  establish  their  properties  (Chapter  4).  Chapter  3 
reviews  five  well-known  VBTs  in  light  of  the  results  just  presented. 
The  final  chapter  contains  concluding  comments  and  directions  for  future 


research. 
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The  current  research  identifies  and  defines  the  components  from 
vhich  all  variance  reduction  techniques  are  built.  Given  a  general 
mathematical-statistical  definition  of  simulation  experiments,  these 
components  or  classes  of  transformations  are  shown  to  be  useful,  to  be 
mutually  exclusive,  and  to  generate  all  possible  VRTs  via  composition. 
The  scope  of  the  results  is  not  limited  to  simulation  since  Monte  Carlo 
and  sampling  theory  problems  are  special  cases  of  the  general  simulation 
experiment. 

Benefits  of  the  research  include:  1 )  the  facility  to  unambiguously 
define  new  or  existing  VRTs,  eliminating  the  confusion  that  currently 
exists  in  the  literature,  2)  the  facility  to  decompose  VRTs  into 
combinations  of  transformations,  making  the  relationships  between  VRTs 
clear,  3)  the  development  of  a  theoretical  foundation  for  analytical 
treatment  of  VRTs,  and  4)  the  development  of  a  setting  for  proposing 
new  VRTs  and  research  questions.  In  addition,  increased  understanding 
of  the  area  should  promote  more  and  better  application  of  variance 
reduction  in  practice. 

Additional  results  of  the  research  are  a  graphical  scheme  for 
describing  VRTs  (vhich  is  applied  to  five  of  the  most  common 
techniques),  and  an  extensive  bibliography  of  variance  reduction 
literature. 


Organization  of  the  Dissertation 

Chapter  2  of  the  dissertation  contains  a  literature  review 
emphasizing  previous  attempts  to  develop  a  unifying  framework.  In 
addition,  a  brief  examination  of  the  confusion  that  presently  exists 


the  sub  -of  the  variance  and  the  square  of  the  bias.  However,  other 
measures  could  be  proposed.  For  the  purposes  of  this  research,  the  tern 
variance  reduction  will  Bean  the  following  more  general  goal: 


minimize  &z[l(Z,  e)] 


where 


i(e,  e)  -  0 


and 

1(Z,  e)  >  1(Z\  e)  iff  !z  -  e|  >  jz*  -  0! 

where  |  |  is  a  metric.  This  general  loss  function  includes  both 
variance  and  MSE,  as  well  as  others.  The  particular  loss  function  is 
application  dependent. 


Research 

Although  there  has  been  significant  research  interest  in  variance 
reduction,  there  have  been  few  attempts  to  structure  and  define  the 
discipline.  The  primary  exception  is  HcGrath  and  Irving  (1973a).  They 
classify  variance  reduction  techniques  according  to  whether  they  modify 
the  sampling  process,  make  use  of  analytic  equivalences,  or  are  simply 
specialised  techniques.  This  classification  fails  to  show  which 
techniques  are  related  to  or  are  particular  cases  of  others,  or  provide 
insight  into  the  underlying  theory  of  variance  reduction.  In  addition, 
the  catchall  category  of  "specialized  techniques"  is  not  satisfying. 
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considered  over  repeated  realizations;  i.e.  its  expected  squared 
deviation  from  its  own  expectation.  To  account  for  the  greater  effort 
usually  involved  with  achieving  greater  precision,  measures  that 
incorporate  "effort"  have  been  proposed.  Most  take  the  following  form 
(Hammersley  and  Handscomb,  1964): 


2  2 

where  a and  Og  are  the  variances  of  estimators  1  and  2,  respectively, 
and  Sj  and  Sj  are  some  measures  of  the  effort  involved  in  using 
estimators  1  and  2  (computer  or  analyst  time,  for  instance). 

Contrast  the  idea  of  increased  precision  (reduced  variance)  with 
increased  accuracy.  Accuracy  refers  to  the  absolute  deviation  of  the 
value  of  the  estimator  from  the  quantity  to  be  estimated.  In  some 
situations  this  quantity  can  be  bounded.  A  similar  measure  is  the 
expected  value  of  the  difference  between  the  estimator  and  the 
parameter,  called  the  bias.  It  is  clear  that  any  arbitrary  constant  has 
optimal  precision,  but  it  will  probably  be  biased  (unless  one  is  so 
lucky  as  to  select  the  value  to  be  estimated).  It  is  also  clear  that 
variance  reduction  in  the  context  of  numerical  or  quasi-Monte  Carlo 
integration  procedures  has  little  meaning,  while  accuracy  does.  If 
unbiased  estimators  are  employed,  then  precision  is  the  only  quantity  to 
worry  about.  However,  some  VRTs  trade  variance  of  the  estimator  for 
bias,  and  this  may  be  quite  acceptable. 

There  are  many  possible  solutions  to  this  problem  of  definition. 
One  could  talk  about  mean  squared  error  (MSE)  reduction,  since  MSE  is 


potential  error,  and  is  often  employed  to  construct  confidence  intervals 
for  the  unknown  quantity  being  estimated.  The  smaller  the  variance  of 
the  estimator,  the  more  certain  one  is  that  the  estimate  is  not 
misleading.  VHTs  are  usually  considered  to  be  methods  for  achieving  a 
given  level  of  precision  at  reduced  cost,  or  greater  precision  at  the 
same  estimation  cost.  Precision  is  a  quantity  inversely  proportional  to 
the  variance. 

VRTs,  sometimes  called  Monte  Carlo  swindles,  have  long  been  applied 
to  Monte  Carlo  and  sampling  problems.  It  is  possible  to  represent  an 
integral  as  the  expected  value  of  a  random  variable,  to  sample  from  the 
random  variable,  and  to  use  the  sample  average  as  an  estimator  of  the 
integral.  Increasing  the  number  of  observations  will  decrease  the 
variance  of  the  estimator,  but  an  excessive  number  of  observations  may 
be  required  to  achieve  acceptable  precision  in  the  absence  of  variance 
reduction  techniques.  Similarly,  to  obtain  an  estimate  with  acceptable 
precision  from  a  simple  random  sample  of  a  large  and  diverse  population 
an  unreasonably  large  sample  may  be  required. 

More  recently  variance  reduction  techniques— many  direct  analogues 
of  Monte  Carlo  and  sampling  methods— have  been  applied  to  computer 
simulation  experiments.  Increasing  the  length  or  number  of  simulation 
runs  will  improve  the  precision  of  the  estimators,  but  not  without  cost. 
"Sometimes  a  variance  reduction  technique,  properly  applied,  can  make 
the  difference  between  an  impossibly  expensive  simulation  and  a  frugal, 
useful  one"  (Law  and  Kelton,  1982). 

In  the  simulation  and  Monte  Carlo  literature,  variance  reduction 
refers  to  any  attempt  to  decrease  the  variance  of  an  estimator 
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Carlo  estimation  literature.  Monte  Carlo  estimation  refers  to  the  use 
of  probabilistic  models  to  evaluate  mathematically  intractable 
integrals.  These  problems  may  not  be  inherently  stochastic.  The 
difference  between  Monte  Carlo  and  standard  statistical  estimation 
problems  is  that 

In  the  standard  statistical-estimation  problem  both  the 
probability  distribution  and  the  parameter  to  be  estimated  are 
assumed  to  be  fixed;  typically,  given  a  sample  of  n  values 
from  the  distribution,  the  best  (or  minimum  variance)  estimate 
of  the  parameter  is  to  be  found.  In  Monte  Carlo  calculations 
only  the  answer,  is  really  fixed  and  the  problem  is  to 
sample  from  that  distribution  which  produces  the  minimum  (or  a 
substantially  smaller)  variance  estimate  of  this  number,  for 
fixed  cost.  (Kahn  and  Marshall,  1953) 


The  theory  of  sampling  is  another  closely  related  field  that 
predates  '  the  development  of  simulation.  Sampling  refers  to  selecting  a 
subset  of  the  members  of  some  population  to  dispover  or  estimate  some 
characteristic  of  the  whole  population.  The  measures  derived  from  a 
sample  are  in  general  subject  to  random  variation.  "The  purpose  of 
sampling  theory  is  to  make  sampling  more  efficient.  It  attempts  to 
develop  methods  of  sample  selection  and  of  estimation  that  provide,  at 
the  lowest  possible  cost,  estimates  that  are  precise  enough  fcr  our 
purpose."  (Cochran,  1977) 


Variance  Seduction 

In  sampling,  Monte  Carlo  estimation,  and  computer  simulation 
problems,  one  is  often  as  interested  in  how  far  from  the  actual  value 
the  estimate  of  a  quantity  may  be  as  in  the  value  of  the  estimate 
itself.  The  variance  of  the  estimator  is  a  common  measure  of  the 
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"These  involve  importance  sampling,  including  the  special  case  of  the 
exponential  transform,  splitting,  and  Russian  Roulette."  (Steinberg, 
1963,  p.  142) 


"...indeed  the  CV  [control  variate]  technique  is  sometimes  called 
regression  sampling."  (Law  and  Kelton,  1982,  p.  360) 

"The  idea  of  this  technique  [stratified  sampling]  is  similar  to  the  idea 
of  importance  sampling..."  (Rubinstein,  198;,  p.  131) 


Control  Variates  is  a  form  of  correlated  sampling  (paraphrased).  (Kahn 
and  Marshall,  1953,  p.  269) 


"Use  of  quasi-random  numbers  is  essentially  an  instance  of  systematic 
sampling."  (Gentle,  1975,  p.  8) 


"A  special  case  of  the  regression  method  is  the  use  of  antithetic 
sampling."  (Gentle,  1975,  p.  4) 


"The  antithetic- variate  method  is  a  variation  of  the  regression  sampling 
method  introduced  earlier."  (Moy,  1965,  p.  18) 


...in  general,  quota  sampling  may  be  described  as  stratified  sampling 
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with  a  more  or  leas  nonrandom  selection  of  units  within  strata." 
(Cochran,  1963,  p.  137) 


Surveys  and  Frameworks 

A  number  of  survey  papers  and  book  chapters  on  variance  reduction 
have  appeared  in  the  literature.  Host  have  concentrated  on  developing 
formulation,  theory,  and  application  of  a  particular  VRT,  and 
occasionally  they  have  advanced  ideas  about  the  relationships  between 
techniques  or  their  taxonomy.  In  this  section  some  of  these  wo  lies  are 
reviewed,  with  particular  emphasis  on  those  that  propose  a 
classification  scheme. 

An  important  early  survey  paper  is  Kahn  (1936),  which  appeared  in 
the  classic  collection  of  papers  from  a  symposium  on  Monte  Carlo  methods 
held  at  the  University  of  Florida  in  1 954 •  Kahn  uses  the  simple  problem 
of  repeatedly  tossing  two  dice  to  estimate  the  probability  that  the  sum 
is  three  to  illustrate  six  VhTs  that  he  feels  are  useful  in  Monte  Carlo 
and  simulation  studies.  Although  the  problem  is  easily  solved 
analytically,  the  example  makes  the  idea  behind  each  approach  clear. 
He  continues  the  exposition  using  an  integral  formulation  of  the  Monte 
Carlo  estimation  problem  to  describe  each  VRT  from  a  mathematical  point 
of  view.  He  uses  the  integral  problem  because  "it  is  the  application  in 
which  the  ideas  are  most  clearly  defined  (p.147)."  The  development  is 
sophisticated,  considering  a  general  multidimensional  integral  and 
deriving  expressions  for  the  variance  of  each  modified  estimator.  When 
possible,  potential  applications  of  each  VRT  are  given  and  optimal 
implementation  strategies  derived.  Kahn  does  not  attempt  to  define  a 


18 


set  of  underlying  concepts  or  develop  a  framework.  However,  in  an 

earlier  paper  (Kahn,  1950)  he  does  identify  four  general  techniques  for 

reducing  variance  in  the  context  of  neutron  transport  problems;  they 

are:  integration  by  random  sampling,  using  a  mathematical  analog,  quota 

sampling,  and  statistical  estimation.  There  is  considerable  overlap 

between  the  categories,  and  many  existing  VRTs  are  not  covered. 

Steinberg  (1963)  proposes  two  principle  classes  of  VRTs,  those 

designed  to  reduce  the  theoretical  variance  of  each  sample  "history" , 

and  those  that  reduce  the  variance  of  a  set  of  sample  "histories."  A 

"history"  is  essentially  an  observation.  This  breakdown  differentiates 

between  techniques  that  change  the  individual  observations  and  those 

that  reduce  variance  through  a  cumulative  effect.  It  ignores  VRTs  that 

change  the  statistic  (function  of  the  observations)  used. 

Probably  the  most  cited  reference  in  all  the  variance  reduction 

literature  is  Hammersley  and  Handscomb  (1964).  Although  many 

researchers  consider  Monte  Carlo  Methods  to  be  concerned  only  with  Monte 

Carlo  problems,  the  authors  are  also  interested  in  simulation  and  devote 

a  chapter  (Chapter  4)  to  the  subject.  In  fact,  their  definition  of 

Monte  Carlo  methods  is  quite  general: 

Monte  Carlo  methods  comprise  that  branch  of  experimental 
mathematics  which  is  concerned  with  experiments  on  random 
numbers,  (p.  2) 

« 

The  text  presents  a  brief  history  and  overview  of  Monte  Carlo  methods 
and  problems,  develops  the  basic  techniques  (Chapter  5),  and 
demonstrates  their  application  to  problems  in  areas  such  as  solution  of. 
linear  operator  equations,  radiation  shielding  and  nuclear  reaction 
criticality. 
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Like  Kahn,  Hammersley  and  Handscomb  develop  variance  reduction 
techniques  from  an  integral  representation  of  the  problem.  They  propose 
no  general  framework,  but  stress  that  two  basic  concepts  underlying 
several  VRTs  are:  1)  sampling  from  an  advantageous  distribution,  not 

necessarily  the  one  that  naturally  appears  in  the  problem,  and  2) 
replacing  an  estimate  by  an  exact  value  when  possible.  VRTs  are 
described,  then  illustrated  using  a  simple  integral  for  which  an 
analytic  solution  is  known.  A  major  feature  of  the  book  is  a  list  of 
original  or  early  variance  reduction  and  Honte  Carlo  references. 

Another  extensive  list  of  variance  reduction  references  is  found  in 
Kleijnen  (1974).  Kleijnen  is  particularly  interested  in  VRTs  that  can 
be  used  in  a  wide  range  of  simulation  studies.  As  a  result,  he  does  not 
discuss  some  VRTs  that  appear  in  the  Monte  Carlo  literature  and  includes 
one  (Selective  Sampling)  that  is  unique  to  simulation.  A  description  and 
"critical  appraisal"  of  six  VRTs  is  given,  with  extensions,  limitations, 
and  corrections  presented.  Also,  the  combined  use  of  two  well  known 
VRTs  (antithetic  variates  and  common  random  numbers),  and  the  resulting 
dangers,  are  discussed. 

While  Kleijnen  does  not  propose  a  specific  variance  reduction 

framework,  he  opens  his  chapter  with  the  following  comments: 

Some  VRT's  change  the  original  sampling  process  completely.... 

Other  VRT's  use  the  same  sampling  process  as  in  crude 
sampling,  but  after  th£  sampling  has  ended,  they  do  not  use 
the  sample  average  x  but  a  more  sophisticated  estimator. . . . 

Some  VRT's  modify  the  sampling  process  in  a  very  subtle 

Wfty  ease 

While  this  is  clearly  true,  it  fails  to  completely  describe  all  the 
possibilities.  However,  a  slight  modification  of  Kleijnen' s  ideas  is 


exhaustive:  A  VST  can  change  the  inputs  to  the  simulation,  it  can  change 
the  simulation  model,  or  it  can  change  the  simulation  outputs.  This 
framework  has  merit,  but  it  groups  techniques  that  exploit  different 
knowledge  or  problem  characteristics.  These  concepts  deal  more  with 
when  than  what  a  VRT  does. 

Another  survey  paper  is  Gentle  (1975).  Gentle  is  interested  in 
VRTs  that  are  robust  to  deviations  of  the  simulation  or  Monte  Carlo 
model  assumptions  from  reality,  and  deviations  of  the  distributions 
produced  by  the  random  variate  generators  from  the  desired 
distributions.  He  describes  nine  distinct  techniques  and  combinations 
of  some  of  them.  Like  Kleijnen  he  warns  that  "simultaneous  use  of 
variance- reducing  techniques  may  not  be  effective  when  different  methods 
acnieve  reduction  in  conflicting  ways...."  (p.  9) 

McGrath  and  Irving  (1973a)  is  the  first  real  attempt  to  define  the 
concepts  underlying  variance  reduction  methods  and  develop  a  framework 
based  on  them.  The  overall  purpose  of  the  paper  is  to  provide  analysts 
with  some  understanding  of  variance  reduction  techniques  and  a  useful 
guide  for  selecting  a  VRT  for  a  particular  problem  (p.  5)»  In  this 
context  they  identify  the  following  concepts  that  variance  reduction 
techniques  employ  to  increase  the  efficiency  of  simulation:  1)  Modify 
the  simulation  procedure,  2)  Utilize  approximate  or  analytic 
information,  and  3)  Study  the  system  within  a  different  context  or 
abstract  representation.  Based  on  these  concepts  the  authors  propose  the 


following  categories: 
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1 .  Modification  of  the  Sampling  Process 

2.  Use  of  Analytical  Equivalence 

3.  Specialized  Techniques 


The  techniques  which  modify  the  sampling  process  effectively 
alter  the  probability  distributions  of  random  variables  so 
that  the  more  significant  events  are  observed  more  often.  The 
use  of  analytical  equivalence  exploits  analytical  expressions 
and  expected  values  to  explain  or  approximate  the  majority  of 
the  phenomena,  thus  leaving  only  the  most  interesting  portions 
of  the  process  to  be  simulated.  Specialized  approaches 
encompass  the  more  sophisticated  techniques  for  achieving 
variance  reduction  [including  the  combination  of  two  or  more 
techniques  in  the  other  categories}.  (p.  27) 


McGrath  and  Irving  are  able  to  classify  sixteen  VRTs  using  the 
above  scheme,  but  they  comment  that  many  of  these  techniques  are  related 
and  it  is  quite  difficult  to  arrive  at  a  completely  distinct 
classification  structure.  This  is  indeed  the  case.  For  instance,  the 
authors  place  systematic  sampling  and  antithetic  variates  in  categories 

1  and  2,  respectively,  yet  it  was  noted  above  that  Roach  and  Wright 
(1974)  claim  one  is  a  subset  of  the  other.  Also,  there  are  several 
problems  with  the  "Specialized  Techniques"  category.  Since  combinations 
of  other  VRTs  are  contained  here,  it  means  that  techniques  related  only 
by  being  combinations  of  others  are  grouped  together.  Also,  there  is  an 
implicit  assumption  that  all  of  the  VRTs  classified  in  categories  1  and 

2  are  fundamental,  or  else  they  would  be  in  group  3*  However,  it  can  be 
shown  that  some  of  these  VRTs  are  built  up  from  still  more  fundamental 
concepts  (see  Chapter  5).  Finally,  the  category  is  a  catchall  for  any 
VRT  that  does  not  fall  in  either  1  or  2,  which  not  desirable. 
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Like  many  of  the  other  authors,  McGrath  and  Irving  stress  an 
integral  formulation  of  the  general  Monte  Carlo  or  simulation  problem, 
claiming  that  it  is  completely  general.  To  demonstrate  the  generality 
they  formulate  a  network  queuing  problem  in  this  way.  For  most  VfiTa 
covered  in  the  paper,  a  theoretical  development,  comparison  with  crude 
sampling,  and  example  application  are  given.  Also,  a  concluding  section 
of  the  report  provides  a  systematic  procedure  for  selecting  and  applying 
several  of  the  more  important  VRTs. 

Kohlas  (1978)  claims  that  the  most  widely  known  VRTs  can  be  divided 
into  two  groups:  correlation  methods  and  the  methods  of  essential 
sample.  Correlation  methods  are  further  divided  into  comparison  and 
compensation  methods;  both  involve  manipulating  or  taking  advantage  of 
correlation  between  observations  to  increase  statistical  efficiency. 
Essential  sample  methods  attempt  to  concentrate  sampling  in  regions  that 
will  make  "significant  contributions"  to  the  estimate. 

Chapter  4  of  Rubinstein  (1981)  is  essentially  an  updated  version  of 
Hammersley  and  Handacomb's  Chapter  5»  Rubinstein  reviews  most  of  the 
same  techniques  the  previous  authors  did,  occasionally  adding  theorems 
concerning  conditions  that  insure  a  variance  reduction,  an  expression 
for  the  theoretical  variance  of  an  estimator,  or  an  implementation 
algorithm.  Also,  several  less  well-known  VRTs,  some  developed  since  the 
publication  of  Monte  Carlo  Methods,  are  explained.  No  general  concepts 
or  framework  is  presented,  but  a  significant  list  of  references  is 
included. 

A  recent  paper  by  Wilson  ( 1983a)  proposes  another  framework  for 
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classifying  VRTs.  He  has  two  categories: 


1 .  Importance  Methods 

2.  Correlation  Methods 

The  importance  methods  achieve  improved  efficiency  by  concentrating  the 
sampling  in  regions  of  the  input  domain  that  make  the  greatest 
contribution  to  the  integral.  The  correlation  methods  are  further 
divided  into  those  that  induce  favorable  correlation  between  blocks  of 
simulation  runs,  and  those  that  exploit  "inherent"  correlation  between 
output  variables  within  each  run  (p.  l).  The  author  does  not  claim  that 
this  categorization  includes  all  existing  VRTs,  but  he  is  able  to  fit 
eight  well-known  methods  into  the  scheme.  The  primary  purpose  of  the 
paper  is  to  survey  recent  research  in  variance  reduction  and  comment  on 
its  potential  benefit  in  simulation  studies. 

Although  this  chapter  did  not  review  any  sampling  literature,  the 
interested  reader  is  referred  to  Cochran  (1977). 


SIMULATION 


To  provide  a  context  for  the  discussion  of  variance  reduction  and 
the  results  presented  in  the  next  chapter,  a  rigorous  and  detailed 
definition  of  simulation  experiments  is  developed  in  this  chapter.  As  a 
preliminary,  some  notation  is  established  and  the  concept  of  statistical 
"information"  is  discussed. 


Notation 

Descriptions  will  generally  be  in  terms  of  matrices,  columns  of 

matrices,  and  elements  of  matrices.  Letters,  Greek  or  English,  without 

subscripts  will  denote  matrices,  letters  with  single  subscripts  will 

denote  columns,  and  doubly  subscripted  letters  are  elements,  using  the 

usual  row-column  convention.  For  instance,  is  the  i^*1  element  of 

column  vector  X,  ,  which  is  the  kth  column  in  the  matrix  X. 
k 

A  letter  with  subscripts  in  parentheses  indicates  a  set  of 
variables  with  indices  in  a  fixed  set.  *(ab)  would  be  used  to  designate 
all  the  elements  X..  in  X  with  subscripts  in  index  set  (ab),  a  set  that 
would  have  to  be  defined.  The  (  )  is  a  mapping  from  a  single  index  to  a 
set  of  indices. 

Random  variables  will  be  denoted  by  capital  English  letters,  and 
realizations  of  these  random  variables  by  lower  case  letters.  For 


example,  y.  is  a  realization  of  random  variable  Y.  .  Any  notation  that 
is  counter  to  the  above  conventions  will  be  specifically  defined  when 
used. 

Information 

The  information  about  an  unknown  parameter  contained  in  a  random 
variable  refers  to  the  extent  to  which  uncertainty  about  the  value  of 
the  parameter  is  reduced  by  observing  the  random  variable.  The  term 
information  will  always  be  used  in  this  statistical  sense.  It  should 
not  be  confused  with  knowledge  of  the  system  to  be  modeled,  or  knowledge 
of  certain  mathematical  or  statistical  relationships.  Knowledge  relates 
to  what  is  understood  or  recognized  by  the  experimenter,  while 
information  is  an  abstract  statistical  quantity.’  A  specific  measure  of 
information  is  not  needed,  but  any  measure  that  is  used  or  proposed  must 
satisfy  the  requirements  given  below. 

Consider  a  statistical  space  S  ■  (Q,  A,  P  ,  0  €  9) ,  where  Q  is  a 
sample  space,  A  a  o-algebra  of  subsets  of  a  (the  events),  and  P0  is  a 
family  of  probability  measures  on  Q  indexed  by  a  parameter  0.  Although 
this  development  is  quite  general,  Q  and  6  can  be  restricted  to  subsets 
of  some  finite  dimensional  Euclidean  spaces  without  loss  of  generality. 
Define  a  random  variable  Z  to  be  a  measurable  mapping  from  (Q,  A)  to  a 
measurable  space  ( 0,  B)  that  does  not  depend  on  0.  Then  Z  induces  a 
statistical  space  S_  ■  (♦,  B,  P  );  Z  may  be  a  statistic  used  to  estimate 
0,  the  identity  mapping,  or  in  general  any  arbitrary  measurable 
function.  Consider  the  amount  of  information  available  to  estimate  0, 
and  call  1(0)  a  measure  of  this  information  only  if  it  satisfies  the 
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following  properties.^ 

(1)  The  amount  of  information  available  with  respect  to  an  unknown 
parameter  0  is  defined  for  a  statistical  space  S  independently  of  any 
estimation  procedure  used  or  inference  desired. 

(2)  The  amount  of  information  contained  in  Z  equals  the  amount  of 

information  contained  in  the  statistical  space  S  induced  by  Z. 

z 

(3)  The  amount  of  information  contained  in  Z  is  less  than  or  equal 
to  the  amount  of  information  contained  in  the  statistical  space  S  on 
which  Z  is  defined. 

2 

(4)  A  sufficient  statistic  Z  contains  all  of  the  information 
included  in  the  statistical  space  on  which  Z  is  defined.  If  there  is  a 
unique  value  of  Z  (a.s.)  corresponding  to  each  possible  value  of  the 
parameter  0,  then  Z  contains  the  maximum  possible  information;  if  Z  has 
the  same  distribution  for  all  values  of  the  parameter  (a.s.)  then  it 
provides  no  information. 

(5)  The  information  given  by  two  statistically  independent 
functions  defined  on  the  statistical  space  3  is  the  sum  of  the 
information  given  by  each  of  them.  If  the  two  functions  are  not 
independent,  then  there  may  be  a  cumulative  effect  resulting  in  the 
total  information  being  greater  or  smaller  than  the  sum  of  the 
individual  informations. 

(6)  The  efficiency  (as  measured  by  the  variance  of  the  estimator) 
with  which  0  can  be  estimated  is  nondecreasing  in  the  amount  of 
information  contained  in  the  statistical  space  on  which  the  estimator  is 

TI  Barra,  J.R.  (1981),  Mathematical  Basis  of  Statistics,  (L.  Her- 

bach,  translation  ed.),  Academic  Press,  N.Y. 

2.  Bickel,  P.J.  and  K.A.  Doksum  (1977),  Mathematical  Statistics: 

Basic  Ideas  and  Selected  Topics,  Holden-Day,  San  Francisco. 
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based;  greater  information  implies  the  potential  for  more  efficient 
estimation. 

A  well-known  measure  of  the  information  in  a  real  valued  vector  or 

general  multidimensional  random  variable,  X,  about  a  real  valued 

3 

parameter  9,  is  the  Fisher  information  number  ;  if  8  is  a  scalar  then 


where  L  is  the  likelihood  function  of  8  given  X.  Under  certain 

4 

regularity  conditions  the  minimum  variance  attainable  when  estimating  0 

from  X  is  a  function  of  1(0).  The  Fisher  information  measure  extends  in 

a  straightforward  way  to  a  vector  0.  This  measure  also  satisfies  the 

six  properties  stated  above.  Clearly  the  value  of  1(0)  is  independent 

of  any  function  of  X  since  it  depends  only  on  the  probability 

distribution  of  X.  The  logarithmic  form  of  the  measure  implies  that  (4) 

and  (5)  will  hold  because  of  the  product  form  of  the  distribution  of 

independent  random  variables  and  the  factorization  result  for  sufficient 

statistics.  That  (3)  holds  is  well-known,  and  the  Cramer-Rao  lower 

bound^  relates  the  variance  of  an  estimator  to  1(0). 

As  an  illustration  of  some  of  these  properties  consider  the 

2 

following  example:  Let  the  sample  space  be  R  ,  the  two  dimensional 

Euclidean  space,  with  the  probability  measure  being  the  bivariate  normal 

2  2 

distribution  with  identical  marginals  denoted  by  N(ti,y,o  ,  o  ,  p).  A 

random  sample  (X^ ,X^)  has  Fisher  information  measure 

Fisher,  R.A.  (1925),  "Theory  of  Statistical  Estimation,"  Proc. 
Camb.  Phil.  Soc.,  22,  700-725. 

4.  Rao,  C.R.  (1973),  Linear  Statistical  Inference  and  Its 
Applications,  Wiley,  N.Y. 

5.  Bickel,  P.J.  and  K.A.  Doksum  (1977),  Mathematical  Statistics: 
Basic  Ideas  and  Selected  Topics,  Holden-Day,  San  Francisco. 
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l(u) - 

(l+p)a 


relative  to  u.  Note  that  in  this  example  l(u)  does  not  depend  on  u, 

which  is  not  true  in  general.  There  are  two  cases  to  examine:  If 

2 

p  -  0,  so  that  X1  and  X2  are  independent  N(u,o  )  random  variables,  then 


The  total  information  is  the  sum  of  the  information  contained  in  the 
independent  sources.  Note  that  the  amount  of  information  i3  increased 
if  o  is  decreased.  If,  on  the  other  hand,  p  t  0  there  is  a  cumulative 
effect  leading  to  more  (p  <  0)  or  less  (p  >  0)  information  than  in  the 
independent  case.  For  this  example  the  conditions  of  the  Cramer-Rao 
lower  bound  are  satisfied,  implying  that  the  minimum  variance  attainable 
for  an  unbiased  estimate  of  w  from  (X^  ,X2)  is  -jyyy. 

Measures  of  information  other  than  the  kind  considered  here  have 

appeared  in  the  literature.  Kullback^  discusses  a  measure  of  the  amount 

of  information  available  for  discriminating  between  two  hypotheses  about 

the  probability  measure  on  the  statistical  space  S.  If  one  considers 

the  information  for  discriminating  between  9  and  9  +  A9,  then  the 

Kullback  and  Fisher  measures  are  closely  related  .  Shannon's  measure  of 
7 

information  is  used  in  communication  theory  to  quantify  the  amount  of 
uncertainty  or  entropy  present  in  a  message  source.  The  more 
uncertainty  (freedom  of  choice)  there  is  in  composing  a  message,  the 

Kullback,  S.  (1959),  Information  Theory  and  Statistics,  Wiley, 

N.Y. 

7.  Shannon,  C.E.  and  W.  Weaver  (1965),  The  Mathematical  Theory  of 

Communication,  The  Univ.  of  Illinois  Press,  Urbana. 
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more  information  the  message  itself  contains.  This  measure  is  quite 
different  from  those  of  Fisher  and  Kullback,  although  some  parallels  can 

be  drawn. ® 

The  concept  of  statistical  information  is  important  in  the 
discussion  of  variance  reduction  since,  as  will  be  shown,  variance 
reduction  techniques  achieve  their  results  by  increasing  the  amount  of 
information  available  and/or  making  more  efficient  use  of  the  available 
information. 


Simulation  Experiments 

In  this  section  a  definition  of  simulation  experiments  is  given. 
Later,  Monte  Carlo  and  sampling  problems  are  shown  to  be  special  cases 
of  this  definition.  The  definition  proves  useful  for  discussion  of 
variance  reduction  by  showing  how  the  random  variables  in  a  simulation 
are  defined,  where  statistical  information  about  the  parameters  of 
interest  is,  and  where  information  may  be  increased  or  lost.  As  will  be 
shown,  VRTs  transform  the  random  variables  in  a  simulation  experiment  to 
increase  and/or  make  better  use  of  information,  so  this  perspective 
captures  the  essential  features  of  variance  reduction.  The  development 
below  is  rather  abstract.  Later  in  the  chapter  examples  are  given  that 
show  how  some  of  the  constructs  usually  encountered  in  simulations  (such 
as  time  and  initial  conditions)  are  contained  in  this  definition. 


Before  beginning,  a  concept  is  introduced  that  will  be  useful 


later.  Consider  three  sets  of  random  variables  U,  V,  and  W.  Let  V  * 

W.  Schutzenberger,  M.P.  (1956)  "On  Some  Measures  of  Information 
Used  In  Statistics,"  in  Information  Theory,  (C.  Cherry,  ed.), 
Academic  Press,  N.Y. 
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variables.  For  instance,  in  the  Monte  Carlo  problem  (2.2)  the  outputs 
are  ordered  Y^,...,Y  corresponding  to  when  the  observations  are 
generated.  Trying  to  relate  successive  rows  in  X  and  Y  to  the  passage  of 
time  is  tempting,  but  is  a  limited  viewpoint  true  only  in  the  simplest 
experiments.  However,  the  order  within  columns  does  correspond  to  the 
order  in  which  realizations  within  that  column  will  be  generated.  Since 
time  is  such  a  common  construct  in  simulation,  it  is  worthwhile  to 
consider  briefly  how  it  is  incorporated  into  the  definition  given  above. 
There  are  two  cases:  time  advanced  in  fixed  increments  and  at  .random 
event  times. 

When  time  is  advanced  in  known,  fixed  increments,  At,  the  time 
increment  is  part  of  the  definition  of  an  output  transformation.  The 
clock  time  after  i  increments  is  a  transformation  of  the  previous  clock 
tim.  t._r  namely, 

‘i  '  ■“>  ‘  *1-1  *  ** 

Thus,  the  clock  time  is  an  output,  and  would  occupy  a  column  in  Y  (if  it 
is  essential). 

In  "discrete  event  simulation"  (see  for  instance,  Fishman,  1978, 
Pritsker  and  Pegden,  1979,  Law  and  Kelton,  1982,  Banks  and  Carson,  1983, 
or  Bratley,  Fox,  and  Schrage,  1983)  clock  time  advances  in  discrete, 
random  steps  between  the  occurrence  of  events.  Usually  the  probability 
distribution  of  the  interval  is  not  explicitly  known  and  advances  are 
generated  by  a  complicated  transformation  of  other  random  variables.- 
For  example,  in  a  large  queuing  network  with  many  servers,  service 
centers,  and  customer  types,  the  interaction  of  various  service  times, 
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Z 


and  9  *  [8  ]  is  the  parameter  of  interest.  Replacing  A,  S,  Q,  9,  W,  W 
with  appropriately  subscripted  elements  of  X,  Y  and  Z  is 
straightforward,  and  g  is  given  by  equation  (3-4),  which  is  written  in  a 
form  like  (3.2).  The  feasible  region  for  I  -  [i  ,1  ]  is 

W  (J 

R  »  l (n,n) :  n  -  1,2,...} 

since  generation  of  a  realization  of  W  requires  a  corresponding  Q.  If 
the  experiment  requires  a  sample  of  500  waiting  times,  then 
H*  -  l 500,  500}. 

As  a  second  example,  consider  the  general  Monte  Carlo  estimation 
problem  of  Chapter  2,  specifically  equation  (2:2).  In  that  example,  8, 
f,  g,  X,  and  Z  correspond  to  the  similar  terms  defined  above.  Now  let 


h(Y) 


Y. 

l 


where  Y  -  g^X^.  The  function  h  is  defined  by  its  functional  form  (a 
summation  of  terms  divided  by  the  number  of  terms  in  the  sum)  and  the 
particular  argument,  Y^.  Note  that  I  *  n  can  be  any  arbitrary  positive 
integer,  and  the  arguments  of  h  all  form  a  single  Y  column. 


Time 

Many  simulation  experiments  explicitly  account  for  the  passage  of 
time,  and  all  simulations  have  some  underlying  ordering  of  their  random 
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the  system  be  denoted  by  A,  a  random  variable  distributed  exponentially 
with  some  known,  fixed  rate.  Let  the  service  time  be  S,  distributed 
exponentially  with  parameter  0,  where  0  is  a  function  of  Q,  the  number 
of  customers  in  the  queue  at  the  beginning  of  service.  Time  in  the 
queue  for  the  i  customer,  VL,  is  given  by  the  well-known  relation: 

Wi  -  max(0,  W._1  +  (5.4) 

An  estimator  of  0  is 
w 


where  n  is  the  number  of  customers.  Based  on  the  definitions  above,  the 


input,  output,  and  statistic  matrices  are: 
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Definition:  E  and  E'  are  everywhere  equivalent  (E  ®  E')  if  and  only 


if 


g(x)  ■  c(g'(x))  R*  -  R'» 

and 

h(y)  ■  h'(y')  V  realizations  y,  y*  generated  from  common  x 

Clearly,  everywhere  equivalence  implies  strong  equivalence. 

Considering  all  experiments  based  on  some  given  n  and  6,  the 
definitions  of  equivalence  partition  the  simulation  experiments  into 
equivalence  classes.  A  VRT  transforms  an  experiment  into  another 
experiment  with  the  same  context,  but  hopefully  one  that  is  not  d- 
equivalent  and  in  fact  has  statistics  with  better  estimation  properties. 
Usually  d  and  s-equivalence  are  conditions  that  cannot  be  verified. 
However,  a  VRT  should  yield  an  experiment  that  is  not  e-equivalent  to 
the  original  one.  Since  e-equivalence  is  the  finest  partition, 
characterizing  the  ways  in  which  simulation  experiments  are  transformed 
into  other,  non  e-equivalent  experiments  characterizes  the  ways  to 
transform  them  into  non  s  and  d-equivalent  experiments  as  well. 


Examples  and  Common  Constructs 

Consider  a  simulation  model  of  a  single  server,  first-come-first- 
served  queuing  system  that  is  used  to  estimate  9^  ,  the  expected  time  a 
customer  spends  waiting  for  service.  Let  the  time  between  arrivals  to 
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realizations. 

Beyond  this  trivial  distinction,  definitions  of  three  types  of 
equivalences  between  experiments  will  be  given.  To  be  equivalent  in  any 
sense  used  here,  experiments  must  be  based  on  the  same  sample  space  ft 
and  have  the  same  parameters  of  interest,  0.  Let  E  and  E*  be  two 
simulation  experiments  with  common  context  (ft, 0). 

Definition;  E  and  E*  are  equivalent  in  distribution  (E  ^  E’)  if  and 
only  if 


Since  the  ultimate  goal  is  to  estimate  0  via  the  statistics  Z,  if 
two  experiments  are  equivalent  in  distribution  they  have  the  same 
statistical  properties.  However,  the  distribution  of  Z  is  not  generally 
known,  while  the  following  may  be: 

Definition:  E  and  E'  are  strongly  equivalent  (E  ®  E')  if  and  only 


if 


h(g(x))  ■  h'(g' (x))  R#  *  R'#  V  realizations  x  of  X 

Clearly,  strong  equivalence  implies  equivalence  in  distribution. 

Definition:  g  and  g'  are  equivalent  except  for  coding  (g  ®  g' )  if 
there  exists  a  one-to-one  mapping  c  such  that  g(x)  -  c(g'(x)). 

The  next  definition  of  equivalence  is  at  the  level  on  which  random 
variables  in  the  experiment  are  actually  defined. 
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The  preceding  definitions  imply  that  f  ,  (g;R*)  and  h  are 
unambiguously  defined  (up  to  equivalent  essential  sets)  for  a  given 
simulation  experiment. 


Equivalence  of  Experiments 

In  the  preceding  section  a  definition  of  simulation  experiments  was 
presented  with  the  idea  that,  given  a  particular  experiment,  it  is 
possible  to  partition  the  random  variables  into  matrices  X,  Y  and  Z,  and 
to  (at  least  implicitly)  identify  f^,  (g;R#),  and  h.  It  has  already 
been  noted  that  the  identification  is  not  unique,  since  for  a  given 

experiment  it  is  possible  to  reorder  the  column  indices  of  the  variables 

•  * 

and  to  have  different  values  of  t  and  k  without  changing  the 

experiment.  Distinguishing  between  experiments  that  are  identical 

*  • 

except  for  the  order  of  their  subscripts  or  values  of  t  and  k  is  not 
necessary.  Specifically,  consider  an  arbitrary  sequence  [a^J 

i  -  *  Also  consider  two  subsequences  jb^J  and  jc^J  with  the 

property  that 

i b± }  n  jcj  -  + 

and 

{bjJ  U  UJ  -  {ai{ 

for  all  values  of  1&.  Then  for  the  purposes  of  this  research  the  two 
representations  of  the  sequence  are  equivalent.  Identify  sequences  such 
as  { f i  an<*  Ig^ji  with  ja^j.  Order  in  the  sequence  and  subsequences  is 
only  constrained  by  the  correspondence  to  order  of  generation  of 


v  V ' 


<000  0*. /.  v.v. v  v  v v  , 


•**  /,  *  ,  *  .  *  4  .  ‘  '  - 
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transformations  of  elements  in  X,  the  relative  lengths  of  the  different 
sequences  being  subject  to  some  restrictions.  The  vector  Z  is  defined 
by  functions  of  the  columns  of  Y  that  contain  information  about  0. 


Axioms  of  Simulation 

A  simulation  experiment  is  composed  of  the  sets  of  random  variables 
X,  Y,  and  Z,  which  are  defined  by  f  ,  (g;R#),  and  h,  respectively. 
However,  the  existence  of  such  definitions  does  not  in  itself  constitute 
a  simulation.  There  are  two  necessary  axioms  that  must  be  acknowledged 
before  X,  Y,  and  Z  represent  a  simulation  experiment: 


Axiom  1  (Existence  of  Information):  The  random  variables  X,  Y,  and  Z 
have  probability  distributions  that  depend  on  0.- 


Axiom  2  (Existence  of  Realizations):  Given  a  source  of  randomness, 

realizations  of  X,  Y,  and  Z  can  be  (perhaps  iteratively)  generated. 


Axiom  1  guarantees  that  estimation  of  0  from  realizations  of  X,  Y,  and  Z 
is  not  fruitless.  Axiom  2  is  more  subtle;  it  establishes  that  the 
dependencies  between  and  within  X  and  Y  are  such  that  R  is  not  empty. 
This  restriction  limits  the  potential  transformations  of  an  experiment. 

Definition:  A  simulation  experiment,  denoted  by  E(fu, (g;R*) ,h; Q, 0) , 
is  specified  by  a  context  ( G, 0),  a  probability  distribution  f^  on  Q, 
output  transformation  and  sampling  plan  (g;R#),  and  statistics  function, 
h,  subject  to  Axioms  1  and  2.  Where  there  will  be  no  confusion,  the 


shortened  notation  E  will  be  used. 
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Z  -  h(Y) 


* 

&  *  1  f  2  f  •  •  •  t  m. 


(3-3) 


For  each  m,  hffl  is  a  transformation  from  a  set  of  output  sequences  Y^ 

* 

to  Z  •  These  functions  do  not  depend  on  9  or  I  .  The  dimension  m  is 
m  m 

known,  finite,  and  equals  the  dimension  of  9. 

The  statistical  space  induced  by  h  is  Sz  =  (r,2,fh),  where  again  f^ 
is  usually  not  known,  or  if  known  is  not  used  to  generate  realizations 
of  Z. 

Notationally,  let: 


h 


and 


The  mth  element  in  Z  is  defined  by  the  function  h  and  its  argument 

m 

Y/  ^ .  Thus,  the  functional  forms  of  h  and  h’  might  both  involve  a 
vm;  mm 

summation  of  squared  tenns  even  though  (m)  ^  (m)'.  Bach  h^  is  defined 

for  arbitrary  length  of  the  sequences  Y ^ . 

The  statistics  aggregate  output  random  variables.  Note  that  each 

element  9  in  9  has  a  corresponding  estimator  Z  in  Z.  Often  h  can  be 
m  m  m 

thought  of  as  being  computed  in  stages,  defining  certain  intermediate 
random  variablet  from  Y  and  then  combining  them  into  a  final  estimator. 

To  summarize,  X  is  the  matrix  of  all  random  variables  in  the 
experiment  whose  probability  distributions  are  known,  in  the  sense 
stated  above.  Elements  of  Y  are  obtained  by  application  of  sequences  of 
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defined  on  X  may  not  be  unique.  A  useful  restriction  that  can  be 
imposed  without  loss  of  generality  is:  A  column  Y  c  Y  will  represent  1) 

Mi 

an  argument,  considered  as  a  whole,  for  one  or  more  statistics,  Z  (see 
below),  and/or  2)  a  sequence  B^  that  parameterizes  f^  for  one  or  more 
k,  and/or  3)  an  output  needed  to  specify  the  sampling  plan,  R#.  Thus, 
columns  are  delimited  by  the  purpose  they  serve.  Note  that  1 )  -  3) 
define  the  essential  random  variables;  random  variables  that  are  either 
arguments  for  the  statistics,  necessary  for  generation  of  realizations 
of  X,  or  determine  when  the  desired  sample  has  been  obtained. 
Information  about  fl  may  be  lost,  but  not  gained,  in  the  transformation 
from  to  Sy  (see  properties  of  information  above).  The  matrix  Y  is  a 
minimal  set  containing  all  the  useful  information  about  0. 

An  alternative  representation  of  (3.1)  is 

hi  ■  (5-2> 

where  [x‘,Y]  is  the  matrix  obtained  by  including  all  columns  of  X  and  Y. 
This  notation  may  seem  more  natural  in  the  simulation  context,  since 
realizations  of  outputs  are  often  generated  iteratively  from 
transformations  of  Xs  and  previous  Ys.  However,  since  all  elements  of  Y 
are  ultimately  functions  of  X,  (3.1 )  is  completely  general  and  has  the 
advantage  that,  given  a  particular  experiment,  X^^  is  unique. 

Definition:  Z,  the  row  vector  of  statistics,  has  real  valued, 
scalar  random  variables  as  elements  that  are  estimators  of  0.  The 
statistics  are  specified  by  h,  where 
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from  a  set  of  elements  X/.„s  to  Y.  These  transformations  do  not 

Vil; 

* 

depend  on  9.  The  region  R  is  an  i  dimensional  feasible  region  that 
accounts  for  the  interrelationship  between  the  individual  g.  s;  R  is 

X  x 

determined  by  all  possible  realizations  of  X.  The  sampling  plan  R#  is 

specified  so  that  I  6  Rf  with  probability  one  when  realizations  are 

» 

generated.  The  column  dimension  1  is  finite  and  does  not  depend  on  I. 
The  statistical  space  induced  by  g  is  S  -  (*,¥,f  )•  The 

y  s 

probability  distribution  f  is  usually  not  known,  or  if  known  is  not 

& 

used  to  generate  realizations  of  Y.  It  is,  however,  naturally 
parameterized  by  9  so  that  estimation  of  9  is  possible  from  realizations 
of  Y. 

Notationally,  let: 


and 


Y  - 


The  row  index  i  corresponds  to  the  order  in  which  realizations  are 

* 

generated  in  an  output  sequence.  The  number  of  sequences,  A  ,  is  not 
uniquely  determined  since  some  sequences  could  be  divided  or  merged  and 
still  satisfy  the  definition  just  given.  Also,  the  essential  set 
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all  that  is  required  is  that  some  joint  or  marginal  distribution 
including  X^  be  known,  f  is  the  vector  of  all  marginal  distributions 
of  f^.  A  special  case  of  an  element  of  X  is  a  constant  selected  by  the 
experimenter  from  a  distribution  known  to  him  (implicitly  or 
explicitly) . 

The  lengths  of  the  sequences  [X^}  are  infinite,  but  the  number  of 
realizations  is  determined  by  the  frequency  of  sampling  from  { f iic)  *  The 

index  i  corresponds  to  the  order  of  sampling.  The  number  of  input 

*  * 

sequences,  k  ,  is  finite  and  known.  However,  k  is  not  uniquely 

determined  since  some  sequences  could  be  divided  or  merged  and  still 
satisfy  the  definition  just  given. 

In  later  discussions  it  will  be  necessary  to  work  with  conditional 


distributions  of  f  . 

u 

denoted 


The  distribution  of  X,  ,  Cj  given  X,.t\  c  X  is 


fUk)!(jt)(l(ik)ll(jt)> 

where  the  parameter  t  has  been  suppressed.  A  shorthand  version  is 

f(ik)!(jir 

Definition:  Y,  the  matrix  of  outputs,  has  real  valued,  scalar 
random  variables  as  elements  and  is  an  essential  set  of  all  random 
variables  defined  on  X.  The  outputs  are  specified  by  (g;R*),  where 

*  -  g(X)  -  Uil(X(il))}  i  *  1,2 . It  1-1,2 . *  (3.1) 
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multivariate)  random  variable  with  unknown  distribution  (see  below). 

The  associated  statistical  space  is  *  (a,X,f^),  where  8  is  the 
sample  space  for  X,  £  is  a  o-algebra  of  subsets  of  0,  and  f  is  the 
probability  distribution  on  8. 

All  of  the  information  in  the  simulation  experiment  for  estimation 
of  0  is  contained  in  the  statistical  space  S^.  However,  the  probability 
distribution  f  is  not  "naturally  parameterized"  by  0,  which  means  that 
0  is  some  unknown  or  complicated  function  of  parameters  of  f^,  making 
estimation  of  0  directly  from  realizations  of  X  difficult.  Thus, 
transformation  of  Sx  into  a  space  whose  induced  probability  distribution 
is  conveniently  parameterized  by  0  is  desirable.  Notationally,  let: 


and 


£ 

The  *  ^fik^xik'  ®(ik)^  are  k  sequences  of  scalar  marginal 

distributions  of  f^.  For  fixed  k,  they  are  identical  for  all  i  expect 
possibly  for  the  value  of  8^)  c  0.  Each  element  X ^  has  marginal 
distribution  f^,  but  statistical  dependencies  can  exist  within  or 
between  column  elements.  Note  that  f  ^  may  only  be  knowu  implicitly; 


.  'V-' 


. '  -  *\ 


■  ■  -  *  »  "_*i  r-M  *  1 
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carried  out  on  a  computer,  these  assumptions  are  not  restrictive. 

Context  of  the  Experiment 

Definition;  9,  the  row  vector  of  parameters  of  interest,  has 

unknown,  but  fixed,  real  scalar  constants  as  elements,  and  has  dimension 

* 

m  .  The  purpose  of  performing  a  simulation  experiment  is  to  estimate  9. 

Note  that  9  is  fixed,  but  other  elements  of  the  simulation 
experiment  are  not  (see  below).  Although  9  ia  just  a  vector  of 
constants,  it  has  a  context  given  by  8,  a  sample  space  sampled  from  to 
estimate  9. 

Definition:  0,  the  sample  space  of  the  inputs,  is  a  subset  of  some 
infinite  dimensional  Euclidean  apace;  8  is  the  intersection  of  the  set 
of  all  sample  spaces  that  can  be  sampled  from  according  to  known 
probability  distributions  (see  below),  and  the  set  of  all  sample  spaces 
whose  sampling  distributions  contain  information  about  9. 

Note  that  8  is  a  fixed  space  that  will  be  sampled  from  according  to 
a  known  distribution.  It  is  possible  that  some  subsets  of  8  have 
probability  zero.  There  will  be  other  sample  spaces  in  the  simulation 
experiment  induced  by  transformations  of  8. 

Definition  of  the  Experiment 

Definition:  X,  the  matrix  of  inputs,  has  real  valued,  scalar  random 
variables  as  elements  and  known  multivariate  probability  distribution. 
X  fw(x!0).  Here,  known  distribution  means  that  f  is  specified  by  an 
analytic  or  tabular  expression  with  parameter  6,  a  real  (possibly 
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V(u)  and  W  *  W(v),  where  u  and  v  are  realizations  of  U  and  V, 

respectively. 

Definition:  A  subset  V*  £  V  is  an  essential  set  defined  on  U  if  1 ) 
for  each  element  7q  of  V  there  exists  a  known  transformation  from 
elements  in  V’  to  Vq,  and  the  transformation  does  not  depend  on  the 
probability  distribution  of  V,  and  2)  for  any  element  VQ  €  V',  no  such 
set  of  transformations  exists  for  V'  -  Vq. 

Thus,  W  can  be  defined  a3  a  transformation  of  V'  alone,  but 
deletion  of  any  element  of  V'  means  V  may  not  be  defined.  Note  that  the 
essential  subset  may  not  be  unique.  For  example,  suppose  U  is  a  scalar, 
and  V1  -  U,  while  -  U2.  Let  V  ■  V,)  +  Then  V'  *  }  ,  unless 

it  is  known  that  U  >_  0,  in  which  case  V'  could  be  either  jV1 }  or  {V2}. 

As  a  second  example,  let  U  «  {Uj,U2},  and  V  *  fV^Vg}  where 

2  U2 
V|  .  u,2  and  v2  -  -ij 

V2 

Then  if  W  -  V’  -  V. 

As  a  less  abstract  example,  consider  the  simulation  of  a  service 
system  and  three  random  variables  associated  with  each  customer:  waiting 
time,  service  time,  and  total  delay.  As  essential  subset  is  any  two  of 
the  three,  since  given  two  the  third  can  be  derived  by  simple 
arithmetic. 

In  the  development  that  follows,  all  sample  spaces  are  subsets  of 
Euclidean  spaces  of  some  dimension,  and  all  random  variables  defined  on 
them  are  real  valued.  Also,  all  transformations  are  Lebesgue  measurable 
and  integrable.  Since  the  concern  is  for  experiments  that  can  be 
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interarrival  times,  and  queuing  disciplines  determine  when  the  next 
event  will  occur.  Clearly,  the  time  step  is  a  member  of  Y.  The  clock 
time  may  be  in  Y  if  it  is  essential. 


Initial  Conditions 

Two  long  standing  issues  in  the  simulation  of  systems  in  steady 
state  are  the  specification  of  initial  "startup"  conditions  and  the 

estimation  of  steady  state  parameters  from  outputs  that  may  be 

9 

contaminated  by  the  chosen  conditions.  The  estimation  problem  is  a 
question  of  what  statistic  to  choose.  Initial  conditions  are  often 
constants,  chosen  for  convenience  ("empty  and  idle")  or  because  they  are 
expected  to  be  consistent  with  the  steady  state  distribution.  They  may 
also  be  selected  randomly  from  a  known  distribution.  In  either  case 
they  would  be  classified  as  inputs  in  the  simulation  experiment. 

As  an  example,  consider  a  simulation  that  generates  outputs 
described  by 


Y.,  -  «Y.  .  .  +  X., 

il  1-1  ,1  i2 


an  autoregressive  process  of  order  one,  where  X^  i  m  1,2,...  are 
identically  distributed  with  some  known  distribution.  If  the  outputs 
are  to  begin  with  Y1  ,  then  an  initial  value  for  Y^  must  be  given.  Let 
that  value  be  X^ ,  either  a  constant  or  a  random  variable  with  a  known 
distribution.  Then  the  matrices  of  inputs  and  outputs  look  like: 


9.  Wilson,  J.R.  and  A.A.B.  Pritsker  (1978),  "A  Survey  of  Research 
on  the  Simulation  Startup  Problem,"  Simulation,  31,  55-58. 


where  «c  is  an  indicator  function  that  equals  1  or  0  if  condition  c  is 
true  or  false,  respectively. 

Aside:  This  case  should  not  he  confused  with  using  a  time  series 
algorithm  to  generate  random  variables  with  a  known  distribution;  if  the 
distribution  is  known  then  the  random  variables  are  inputs,  no  matter 
what  method  we  use  to  generate  realizations.  For  example,  if 
Yl 1 ,Y2i  , . . . ,Yqi  have  a  known  multivariate  normal  distribution  and  are 
generated  as  such,  then  they  become  ,X2^ , . . . . 
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Stopping  Rules 

In  the  examples  given  earlier,  the  sampling  plan  R#  was  specified 
by  a  number,  say  n,  of  observations  of  a  particular  output.  This  was 
easily  denoted  by  a  single  element  of  R.  Two  other  cases  are  possible. 

It  may  be  that  R#  is  a  region,  rather  than  a  point,  in  which  case 
sampling  stops  when  I  is  first  contained  in  R*.  For  instance,  suppose 
that  in  a  network  queuing  simulation  the  run  is  terminated  when  station 
1  has  serviced  50,  and/or  station  2  has  serviced  60  customers.  If  I  * 
[I,  ,I2],  then 

R,  -  U1  ,l2'-  150  or  I2  >_  60  and  I  g  R} 

The  second,  more  interesting  case  is  stopping  rules  based  on 
satisfying  a  condition  other  than  a  count.  A  simulation  run  that 
terminates  when  the  clock  time  is  480  minutes,  or  one  that  stops  when  a 
resource  is  depleted  are  examples.  Situations  such  as  these  can  always 
be  represented  by  an  output  variable  whose  realization  indicates  that 
the  stopping  condition  has  been  met.  For  convenience  denote  this  output 

by  Y  and  write 

l 

R*  -  {Is  I  *  -  1  and  I  €  R} 

1 


Sequential  Procedures 

In  simulation  experiments,  as  well  as  in  general  statistical 
experiments,  sequential  procedures  may  be  employed  to  estimate  a 
parameter  of  interest.  Such  procedures  are  characterized  by  the 


47 


selection  of  an  initial  sample,  analysis  of  the  results,  and  a  decision 
to  continue  sampling  or  stop  based  on  the  results  of  the  analysis. 

Considering  such  procedures  in  light  of  the  definition  of 
simulation  experiments,  modelling  sequential  sampling  might  at  first 
appear  to  require  some  sort  of  " feedback"  or  control  structure  from  the 
statistics  (h(Y))  to  the  output  transformations  (g(X)).  However,  it  is 
our  conscious  intention  to  make  a  distinction  between  the  "design"  and 
the  "analysis"  aspects  of  simulation  experiments.  Sequential  procedures 
affect  the  sampling  plan,  R#,  and  thus  are  a  part  of  the  design  .aspect 
(as  are  the  inputs).  In  fact,  sequential  procedures  are  simply  a  kind 
of  stopping  rule,  as  discussed  earlier.  The  statistics  are  the  analysis 
part  of  the  experiment,  and  will  always  be  functions  of  a  fixed  pool  of 
data.  While  this  is  not  the  only  possible  perspective,  it  seems 
justified  since  restrictions  on  sampling  are  embodied  in  f  and  g. 

Joint  Distributions  of  the  Inputs 

As  stated  earlier,  the  elements  in  X  can  have  any  feasible  joint 
distribution.  Often  the  elements  in  a  column  are  independent, 
identically  distributed  random  variables  indexed  by  the  order  in  which 
realizations  are  sampled.  When  encountering  statistical  dependencies, 
however,  two  types  are  common:  Identically  distributed  multivariate 
vectors  where  each  element  in  the  vector  has  a  different  marginal 
distribution,  and  identically  distributed  scalar  random  variables  that 
are  pairwise  (or  in  general  m-tuplewise)  correlated.  The  first  type 
would  be  represented  by  columns  of  scalars  where  corresponding  (same  row 
index)  elements  have  known  joint  distribution.  The  second  case  is 


denoted  by  a  single  column  where  the  row  indices  indicate  the 
correlation  structure.  Remember,  also,  that  if  the  distribution  is  not 
completely  specified,  then  the  unknown  parameters  for  marginal 
distributions  f^  are  given  by  one  or  more  columns  in  Y. 

Confidence  Intervals 

In  this  research  the  assumed  goal  of  the  simulation  experiment  is 
to  derive  point  estimates  of  unknown,  real  parameters.  However,  outputs 
from  simulation  experiments  are  often  used  to  construct  confidence 
intervals  on  these  parameters.  Variance  reduction  and  confidence 
interval  construction  are  related  because  the  properties  of  the  interval 
are  generally  a  function  of  the  properties  of  the  point  estimator(s) . 
Thus,  while  attention  is  restricted  to  point  estimates  and  their 
variance  the  research  is  relevant  to  confidence  interval  construction. 

Types  of  Statistics 

The  statistics  defined  by  a  simulation  experiment  can  be  separated 
into  two  types  based  cn  the  outputs  that  are  their  arguments. 
Observational  statistics  aie  based  on  individual,  discrete  outputs 
without  relation  to  when  the  output  was  generated.  Time-persistent 
outputs  have  values  defined  over  time,  and  require  not  only  the  value 
but  also  the  time  period  over  which  it  persisted  ("time"  can  be  any 
index).  In  the  definition  of  simulation  the  distinction  is  irrelevant,* 
since  both  types  of  outputs  are  represented  by  scalar  random  variables. 
Also,  it  does  not  matter  whether  the  outputs  came  from  replication 
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"runs"  (usually  implying  independent  replications)  or  a  single  steady 
state  "run";  columns  in  Y  can  represent  either  kind  of  output.  Of 
course,  the  particular  estimators  used  will  depend  on  both  these 
factors. 


Extension  to  Realizations 

The  definition  of  simulation  experiments  given  above  can  be  easily 
extended  to  describe  how  realizations  of  the  experiment  are  generated. 
A  basic  source  of  randomness — usually  scalar,  independent,  identically 
distributed  U(0,1)  random  variables — is  transformed  into  random 
variables  X  distributed  according  to  f^.  The  transformations  g  and  h 
are  employed  to  yield  a  realization  of  Y  and  Z  from  a  realization  of  X. 
Of  course,  simulations  are  usually  implemented  as  computer  algorithms. 
Note  again  that,  when  considering  variance  reduction,  the  interest  is  in 
how  random  variables  are  defined  and  not  how  realizations  are  actually 
generated,  although  the  method  of  generation  will  often  affect  what  can 
be  achieved  in  practice. 


Monte  Carlo  and  Sampling  Experiments 

The  general  Monte  Carl"  integration  problem  fits  easily  into  the 
simulation  characterization.  Since  the  problem  is  that  of  evaluating  a 
known  integral,  and  since  the  problem  can  be  made  stochastic  by 
introducing  a  probability  distribution  into  the  integral,  the  defining  f 
distributions,  g  transformations,  and  h  functions  are  easily  identified 
and  can  be  expressed  in  closed  form  (see  the  earlier  example).  However, 


even  the  solution  of  problems  like  (2.1)  by  numerical  or  quasi-Monte 
Carlo  methods  (Hammersley  and  Handscomb,  1964)  is  covered  by  the 
definition.  In  those  situations,  the  points  at  which  the  integrand  is 
evaluated  (the  inputs)  are  known  constants.  Thus,  there  is  zero 
variance;  independent  realizations  of  the  experiment  will  all  yield  the 
same  estimate.  However,  the  accuracy  of  the  method  (difference  between 
the  estimate  and  the  true  answer)  will  not  be  zero  in  general. 

That  survey  sampling  problems  can  be  characterized  as  above  is  less 
obvious.  However,  in  the  usual  case  of  probability  sampling  (Cochran, 
1977)  the  relationship  can  be  demonstrated.  In  probability  sampling  a 
set  of  distinct  samples  from  a  (usually  finite)  population  that  the 
sampling  procedure  is  capable  of  selecting  is  defined,  and  each  possible 
sample  is  assigned  a  probability  of  selection.  One  of  the  samples  is 
selected  with  likelihood  given  by  this  probability,  and  an  estimate  of 
whatever  quantity  is  of  interest  is  made  from  the  responses  given  by  the 
elements  in  the  sample.  In  terms  of  statistical  spaces,  the  triple  (a, 
*,  f^)  corresponds  to  the  possible  samples,  the  events,  and  the 

probabilities  assigned  to  each  possible  sample.  The  most  common 
sampling  distribution  is  simple  random  sampling,  where  each  possible 
sample  is  equally  likely  to  be  selected.  Sampling  with  or  without 
replacement  are  two  procedures  for  generating  such  samples,  depending  on 
whether  the  same  element  can  appear  in  a  sample  more  than  once.  The  g 
transformations  that  induce  the  space  ( ♦,  ¥,  f  )  are  more  implicit, 

o 

representing  how  responses  from  the  sample  are  obtained  and  the 

allocation  of  sampling  effort.  The  space  S  is  as  before,  representing 

z 

the  estimators  employed. 
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Consider,  for  example,  an  experiment  to  determine  the  expected 
lifetime  of  a  type  of  light  bulb.  The  procedure  might  be  to  take  a 
sample  of  size  n  from  a  lot  of  bulbs  in  such  a  way  that  each  possible 
size  n  sample  is  equally  likely,  burn  the  bulbs,  record  the  time  until 
burnout,  and  estimate  the  expected  life  by  the  average  of  these  values. 
The  population  of  light  bulbs  and  the  sampling  procedure  define  X,  the 
sample  size  n  and  the  method  of  establishing  lifetimes  defines  Y,  and  2 
is  defined  by  the  estimation  rule  (simple  average)  and  the  outputs,  Y. 


Analytic  Solutions 

Consider  a  given  simulation  experiment,  E.  Given  sufficient 
insight,  it  may  be  that  the  value  of  0  can  be  deduced  analytically. 
Such  a  solution  procedure  is  not  outside  the  scope  of  this  research. 
Think  of  a  continuum  between  the  original  stochastic  experment  and  an 
analytic  determination  of  0,  specified  by  the  precision  of  the 
statistics  Z  in  the  experiment.  Then  VRTs  transform  a  given  experiment 
into  another  one  in  this  continuum.  An  analytic  solution,  of  course,  is 
the  limiting  case  having  infinite  precision. 


CLASSES  OF  TRANSFORMATIONS 


Given  the  definition  of  simulation  experiments  in  the  previous 
chapter,  this  chapter  develops  a  framework  for  VI. Ts  based  on  six  classes 
of  transformations  of  simulation  experiments.  The  six  classes  are 
defined  in  the  next  section.  After  defining  the  classes  it  will  be 
shown  that  they  generate  all  possible  VRTs  via  composition,  that  they 
are  disjoint  classes,  and  that  they  are  useful  for  achieving  a  variance 


reduction.  Uniqueness 

of  the 

particular 

partitioning 

of 

the 

transformations  is  not 

claimed. 

Trying 

to 

relate  each 

class 

of 

transformations  directly 

to  a  well- 

-known  VHT 

is 

tempting,  but 

misses 

the 

point.  A  transformation  redefines  an  experiment  in  a  way  that  may  be 
favorable;  it  is  not  necessarily  a  VRT  nor  does  its  use  imply  a  variance 
reduction.  The  last  two  sections  discuss  subclasses  of  transformations 
within  the  original  six  and  explain  the  relationship  between  the  six 
classes  of  transformations  and  information  for  estimation  of  0. 


Definitions  of  the  Transformations 

Recall  that  VRTs  attempt  to  increase  information  and/or  make  better 
use  of  information  for  estimation  of  0  in  a  simulation  experiment. 
Three  of  the  classes  affect  the  amount  of  information,  while  the 


•  . 


l  V.  A.  *  < 
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remaining  three  concern  the  use  of  information.  The  six  classes  of 
transformations,  along  with  some  additional  refinements  that  will  be 
discussed  later,  are: 

Amount  of  Information 

Distribution  Replacement  (DR) 

Dependence  Induction  (Dl) 

(independent  case) 

(dependent  case) 

Sample  Allocation  (SA) 

(series) 

( replication) 

Use  of  Information 

Equivalent  Allocation  (EA) 

Equivalent  Information  (El) 

Auxiliary  Information  (Al) 

(about  0) 

(about  Z) 

Definition:  The  experiment  set,  ES(8, 0),  is 

Ea(n,e)  -  U  E(fu,(g;R»),h; n,e) 

where  the  union  is  over  all  (fu,(g;R#) ,h)  for  a  fixed  context  (fl,e)  such 
that  Axioms  1  and  2  are  satisfied. 

The  definitions  below  establish  classes  of  transformations  with 
domain  and  range  E3( 8, 0)  for  fixed  (8,0).  These  transformations  map  a 
simulation  experiment  into  another  non  e-equivalent  (but  possibly  s  or 
d-equivalent)  experiment  in  the  same  experiment  set.  A  transformation 
will  be  denoted  by  T,  possibly  with  subscripts.  If  a  transformation  is 
defined  as  altering  the  definition  of  f^,  g,  or  h  alone,  then  the 


remaining  components  are  unchanged.  Note  that  any  experiment  set  has 
six  classes  of  transformations  associated  with  it. 

In  this  chapter,  if  two  distributions  are  not  equal  then  they  are 
not  equal  on  a  region  of  positive  probability. 

Transformation  of  the  Inputs 

Distribution  Replacement  (DR):  €  DR  if  and  only  if 

V  f.w ->»•.<«) 

such  that 

f*  (x)  t  f  (x)  for  some  x 
u  u 

and 

f' 

f'  4ul  t  \  “  f.ul/  x  -5“^  V  ik 

ik|(c)  iki(c)  fik 

where  is  the  probability  distribution  of  Xifc  given  X  - 

Dependence  Induction  ( DI ) :  €  DI  if  and  only  if 

V  ->  f'«U) 


such  that 
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ik 


fik 


V  ik 


Transformation  of  the  Outputs 

Equivalent  Allocation  (EA):  6  BA  if  and  only  if 

T^:  g(x)  ->  g'(x) 

such  that 

g'(x)  ^  g(x)  for  some  x 

and 


R'#  -  R# 

Sample  Allocation  (SA):  T4  €  SA  if  and  only  if 

T  ;  g(x)  ->  g'(x) 

such  that 


R’#  +  R. 


and 


tg’ilCx)}  !  }gu(*)l  V  il,  X 


(Recall  that  C  means  equal  except  for  coding;  see  Chapter  5) 


•"i 

£ 
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msformation  of  the  Statistics 

Equivalent  Information  (El):  6  El  if  and  only  if 

T<.:  h(y)  ->  h'(y) 

3h  that 

h'(y)  /  h(y)  for  some  y 


(m)'  *  (m)  V  m 


Auxiliary  Information  (AI):  €  AI  if  and  only  if 


Tg?  h(y)  ->  h'(y) 


ch  that 


(m)'  y  (m)  for  some  m 


d 


h’m(y)  •  hm(y)  V  m,  y 

Notice  that  the  transformations  for  each  set  X,  Y,  and  Z  are 
rallel,  and  each  one  changes  the  definition  of  scalar  random  variables 
the  sets. 


-  «•-  *  -  m..*  •-  '-V_  <-  jT-V.  •_  •. 


1 


A 


*i 


- 


-i 


-j 
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>erties 

Before  proceeding  to  the  main  results  in  the  next  section,  three 
»as  establishing  properties  of  the  classes  DR  and  DI  are  proved.  In 
ie  proofs  and  those  in  the  remainder  of  the  chapter,  all  probability 
tributions  are  assumed  to  be  integrable,  and  all  integrals  are  over 
entire  domain  of  the  variable  qf  integration.  If  the  distribution 
discrete,  then  the  integral  would  be  replaced  by  a  summation  over  all 
sible  values  in  the  domain  and  the  proof  would  proceed  as  given.  In 
e  of  the  proofs  there  are  ratios  of  distributions  that  could  have 
o  denominators  for  some  values  of  the  function  arguments.  For  all 
h  values  of  the  arguments  the  ratio  is  also  multiplied  by  zero. 
>se  situations  will  be  left  undefined  and  attention  restricted  to 
ues  of  the  arguments  where  the  denominators  are  nonzero. 

una  1  :  For  all  T  €  DR,  T:  f  ->  f' 

-  u  u 

fik|(c)  ‘  f'iki(c)  T~~  V  lk 

>of i  by  the  definition  of  the  class  DR  and  simple  algebra. 

qed 

ima  2:  T  €  DR,  T:  f  ->  f*  if  and  only  if 
-  11)0)  J 
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1  3  1 

W,  *  -5T—  w  =  w .  =  — 

1  2n  n  2n  in 

it  is  believed  the  dice  "warm  up.") 
sater  variance 


i  -  2,3, • • • i n— 1 

In  this  case 


the 


estimator 


Var[z 


.052  .130 
a  -  +  — - 

n-2  2 

n 


",  going  from  Z  .  back  to  Z  is  an  example  of  an  effective  use  of 


qed 


sses  of  Transformations 

hia  section  briefly  discusses  some  interesting  subclasses  within 
ix  classes  of  transformations.  These  refinements  are  useful  in 
ce,  and  could  be  the  subject  of  future  research. 

hen  a  transformation  from  the  class  DI  is  applied  to  a  simulation 
Bent,  it  is  most  often  to  induce  statistical  dependence  between 
that  were  originally  independent.  See  Chapter  5  for  two  examples, 
dependence  is  usually  induced  within  columns  of  identically 
buted  input  sequences  in  X,  but  may  also  be  across  columns, 
n  the  class  SA  it  is  often  of  practical  importance  to  distinguish 
n  those  transformations  that  yield  a  different  number  of  (usually 
ndent)  replications,  and  those  that  alter  the  length  of  an  output 
ce  in  a  single  "run."  Stated  another  way,  some  change  the  largest 
of  an  output  sequence,  while  others  result  in  additional  or  fewer 
ations  of  a  sequence. 


TO 


:ould  be  justified  if  the  fact  that  a  p^  is  known.  Again  using 
Llocation  m  *  n 


.022  .031 

a  -  +  - — 

n  2 

n 


iary  Information  (Al) 

Continuing  to  work  with  the  (original)  Z  estimator,  notice  that 

all  the  available  information  is  utilized.  Since  =  p  ,  use  the 

bservations  in  Y_,  and  vice  versa.  Thus,  both  Y«  and  Y,  are  based 
3  *  3 

observations,  and 


Var[zai] 


urse  Z  .  is  biased  because  Y„  and  Y_  are  dependent.  However,  it  is 

£11  c  ^ 

consistent. 


alent  Information  (El) 

Recall  the  original  estimator,  Z.  A  statistic  using  equivalent 
mation  is 


69 


if  the  ith  single  toss  is  1 


0,  otherwise  i  ■  1,2,...,m 


i3 


1 ,  if  the  ith  single  toss  is  2 


0,  otherwise  i  *  m+1,...,2n 


Let  the  statistic  be 


Z 

sa 


2Y  Y 
2*3 


cey  point  here  is  that  the  variance  of  Z  depends  on  how  the  2n 

Sfi 

as  are  allocated.  In  this  case,  the  optimum- allocation  is  to  let  m 
and 


T»H2aaJ 


talent  Allocation  (EA) 
Use  the  same  approach 


i3 


2’ 


l 

2’ 


0, 


as  in  illustrating  SA,  but  now  score 
if  the  ith  single  toss  is  2 

if  the  ith  single  toss  is  4 

otherwise  i  -  m+1,...,2n 
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pendence  Induction  (Dl) 

On  any  particular  pair  of  tosses,  the  outcome  (first,  second)  is 
ist  as  likely  as  the  outcome  (7  -  first,  7  -  second).  For  instance, 
is  events  (2,1)  and  (5,6)  have  the  same  probability  of  occurrence, 
tus,  if  (first,  second)  is  rolled  on  toss  2i  -  1 ,  use  (7-first, 
•second)  for  toss  2i.  This  causes 

Cov^I2i-1 ,1 ,r2i, J  ■ 

id  results  in 


ample  Allocation  (SA) 

Now  approach  the  original  problem  a  bit  differently.  Since  p  ■ 
PjP2  use  the  2n  single  die  tosses  to  estimate  p^  and  p£.  Let 
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Ti1 


1 ,  if  the  sum  of  the  ith  toss  is  3 


(4.4) 


0,  otherwise 


i  *  1 ,2, . . . ,n 


,'hus  p  »  Pr(Y^  *  1).  As  the  statistic  take 


2-1  ,, 

n  i-1  11 


Tor  which 

E[Z]  ■  p  and  Var[z]  * 

Por  convenience  later  define 

Pj  -  Pr(toss  of  a  single  die  »  j) 

The  experiment  is  defined  by  the  probabilities  p^  that  define  the 
working  of  the  dice  (inputs),  the  transformation  (4*4)  that  gives  the 
score  (outputs),  the  sampling  plan  R#  -  (2nj ,  and  the  statistic  Z. 

Distribution  Replacement  (DR) 

Suppose  that  the  working  of  the  dice  is  redefined  in  the  following 
way.  Let 

P1  *  P2  "  3 

and 

p3  "  p4  "  p5  "  p6  "  T7 

Thus  the  total  3  occurs  four  times  as  often,  on  average.  To  compensate 


.v 


1 


'-■j 


'U 


V 

V 
»  , 


V 


V 

"N 


the  inputs.  Thus,  {DR,  DlJ  are  necessary  to  generate  those 
transformations  that  redefine  the  inputs.  Similarly,  the  classes  (EA, 

SA}  and  {El,  Al}  are  necessary.  The  result  is  then  an  immediate 

consequence  of  Theorem  2,  which  shows  that  these  pairs  of  classes  are 
all  disjoint. 

qed 

Theorem  Under  the  loss  function 

i(z,e)  -  (z  -  e)2 

where  0  is  a  scalar,  there  exists  for  each  of  the  six  classes  a 

simulation  experiment  E  that  can  be  transformed  by  a  transformation 
whose  composition  includes  a  member  of  that  class  into  some  E  such  that 

E2[l(Z\9)]  <  Ez[l(Z,e)] 

where  Z  and  Z'  are  both  consistent  estimators  of  0. 

Proof.  The  proof  is  by  example.  The  example  used  was  originally 

suggested  by  Kahn  (1956)  to  explain  some  basic  VRTs.  Consider  the 
problem  of  estimating  the  probability,  p,  that  the  sum  of  two  fair  dice 
is  3.  Clearly  p  -  1/18,  but  suppose  that  this  is  not  known  and  p  will 
be  estimated  by  tossing  dice.  Toss  n  pairs  of  dice  (2n  single  dice),  or 
have  a  computer  program  simulate  these  tosses,  and  let 
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f'iki(c)  »*  fik|(c)  for  some  ik  lemoa  3* 

which  implies  that  f'iJc  j*  for  some  ik  by  the  definition  of  DR. 
Thus,  fc  DI  by  the  definition  of  DI. 

Next  consider  T~  6  DI.  Then  T_:  f  ->  f'  such  that 

2  2  u  (it 


f'  j4  f  and 
u  u 


Suppose  that 


This  implies  that 

f'  -  f 

U  ID 

by  lemma  3.  which  is  a  contradiction.  Thus,  there  exists  some  ik  such 
that  (4.3)  does  not  hold.  This  in  turn  implies  that  £  DR. 


qed 


corollary  2^1_:  Theorem  1  does  not  hold  if  any  class  is  eliminated  from 
the  six  classes. 

proof:  All  six  classes  of  transformations  are  needed  to  prove  Theorem  1 , 
unless  some  classes  contain  elements  that  have  the  same  effect  as. 
members  of  other  classes.  However,  by  definition  jEA,  SA}  and  { El ,  AX } 
do  not  transform  the  inputs,  so  no  composition  of  them  will  transform 
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T, 


7* 


hm<Y(m)’>  -> 


V  m 


Thus,  €  El  by  definition  of  El. 
Finally,  let 


T 


T  O1?  o  I  o  T  o  T  o  T  o  to  t 
0  L\  l2  3  4  5  6 


7 


Then,  by  construction 


T:  E  ->  E* 

and  T  is  a  composition  of  transformations  in  the  six  classes  DR,  DI,  EA, 
SA,  El,  and  Al. 


qed 


Theorem  2:  The  six  classes  of  transformations  are  disjoint. 


Proof:  For  any  given  experiment  set,  the  classes  EA,  SA,  El,  and  AI  are 
mutually  disjoint  by  their  definition  and  the  definition  of  simulation 
experiments.  Also,  they  are  clearly  disjoint  from  DR  and  DI.  What 
remains  to  be  shown  is  that  DR  and  DI  do  not  overlap. 

Consider  T,  €  DR.  Then  T, :  f  ->  f '  such  that 
1  1  u  u 


f’  +  f  and 
u  w 


f  ilc  1(c) 


fiki(c) 


V  ik 


Now  since  f'  j*  f  ,  then 

U  CDV 
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lgit}  i  -  1 .2,...  %  * 

and 

{g’iJ  i  -  1,2,...  i  -  1,2 . *** 

Let  T.  be  the  transformation  such  that 
4 

V  isil!  _>  c{g’il* 

* 

i  -  1 ,2,...  t  -  1  ,2, . . . 1  ' 

Thus,  6  EA  by  definition  of  EA  and  the  fact  that  0  is  always 
achievable.  Now  let  Tc  be  the  transformation  such  that 

T5:  0  ->  R’* 

Thus,  T5  e  SA  by  definition  of  SA  under  the  representation  g* . 

Next  consider  h(Y)  and  h'(Y'),  where  Y  °  Y’.  Since  different 
coding  is  irrelevant,  without  loss  of  generality  let 

h'  <-  h'  o  c 

where  0  denotes  composition.  Thus,  h(Y)  and  h* C Y)  are  being  compared. 
Let  Tg  be  the  transformation  such  that 

Tg:  (m)  ->  (m)'  V  m 

Thus,  Tg  €  AI  by  definition  of  AI.  Now  let  T^  be  the  transformation 


such  that 
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Proof:  The  proof  is  by  construction  of  T.  Assume  a  definition  of  E  and 
E*  given  by  (fu,(g;R#) ,h)  and  (f' u,(g' ;R'#) ,h‘ ) ,  respectively.  Let  TQ 
be  the  transformation  such  that 


T 


0 


R*  ->  0 


where  0  is  a  vector  of  all  zeroes  of  the  same  dimension  as  R-.  Thus,  T_€ 
—  0 

SA  since  £  is  always  an  achievable  sample  allocation. 

Next  consider  f ^(x)  and  f'u(x).  Both  have  the  same  support,  Q. 

Let 


”  fik  and 


n  f* 


ik 


be  the  product  of  all  scalar  marginal  distributions  of  f^  and  f  , 
respectively.  Let  T1  be  the  transformation  such  that 


1  ; 


f  ->  n  f.. 

<>i  ik 


Thus,  Tj  €  DI  by  definition  of  DI.  Now  let  Tg  be  the  transformation 
such  that 


T2:  11  fik  _>  11  f' 

Thus,  T^  €  DR  by  definition  of  DR.  Now  let 
such  that 


be 


the  transformation 


T 


3 


n  f’ik 


-> 


r 


Ul 


Thus,  Tj  6  DI  by  definition  of  DI. 

Next  consider  (g(x);0)  and  (g'(x);R'#)  defined  on  the  same  x. 


Consider  ' he  representations 


61 


f1 ,2|3,4,... ,n-i+1  "  fl|2,3 . n-i+1  f2 ! 3 ,4 , . • . , n-i+1 

2 | 1 • • • t n+i- 1  ij^f^,*.** n- i+1 

The  proof  proceeds  exactly  as  above  to  show  that  the  (n-i-1 )st  order 
scalar  conditionals  are  determined.  By  induction,  this  shows  that  all 
scalar  conditionals  (including  the  first  order  scalar  marginal 
distributions)  are  determined.  And  since 

f1 ,2,...,n  “  f1  f2 ! 1  f 3 1 1 ,2  fn|l,2,...,n-1 

then  f.  „  is  determined. 

1 ,2, . . .  ,n 

qed 

Main  Results 

In  this  section  the  main  results  of  this  research  are  proved; 
namely  that  for  a  given  experiment  set  Es(  n, 0)  the  associated  six 
classes  of  transformations  generate  all  possible  VRTs  via  composition, 
they  are  disjoint  classes,  and  they  are  useful.  Remember  that  these 
transformations  are  from  any  e-equivalent  class  of  experiments  to  any 
other  e-equivalent  class  in  ES(G,9).  In  the  results  that  follow,  E  and 
E'  are  both  elements  of  E  ( 0, 9). 

Theorem  1 :  Given  E  ^  E',  there  exists  a  transformation  T:  E  ->  E' ,  and  T, 
is  a  composition  of  members  of  the  six  classes  of  transformations. 
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»>n  .  r(x) 

1 i3  »4, . • • »n  1 |2,3» • • • »n 


(4.2) 


for  all  i  ■  (x, ,...,x  ).  It  is  easy  to  show  that  if  the  denominator  of 
i  n 

(4.2)  is  zero  for  some  x,  then  the  numerator  is  as  well;  in  fact  so  is 
f,  0  _.  Such  values  of  x  are  not  of  interest  and  the  value  of  r(x) 

will  be  left  undefined  there,  restricting  attention  to  values  of  x  where 
this  is  not  the  case. 

Now  for  any  fixed  value  of  (x^ ,x^, . . . ,xQ)  equation  (4.2)  gives 

1  ■  J'f2i3,4 . ndI2  ‘  f1|3,4 . .  I  r(l>dx2 

which  implies  that 

f  .  1 

l|3»4,....n  JrU)dx2 

By  the  argument  above,  r(x)  >0  in  the  regions  of  interest,  so  the 

density  exists.  Since  r(x)  is  given,  f. .  is  determined.  Using 

1  m»4»  » •  •  »n 

a  similar  argument,  it  can  be  shown  that  all  (n-2)nd  order  scalar 
conditionals 


jl  1  »•  •  •  t  »•••»& 


1 ,2, . . . ,n 


are  determined  by  the  (n-l)st  order  scalar  conditionals. 

The  remainder  of  the  proof  proceeds  by  induction.  Assume  that  the 

(n-i)th  order  scalar  conditionals  are  determined  for  some  i  •  2,3,...,n. 

For  example,  f  )p  ,  .  and  f  ,  „  ....  Then  write 

l \i , p, • . • ,n-i+i  2 i l ,3 ,4, • • • ,n-i+1 


* 


a 
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After  some  rearrangement 
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-  f/.M,,.  -S-^-  V  ilc 


(c)  |ik  ( c)  1  ilc  f^ 


(4.1) 


jroof:  Let  T  €  DR.  For  any  fixed  ik 


'  (c)  "  f  ik| (c)  “  fiki(c) 


(since  T  6  DR) 


f  f*  f ' 

a*  _ ik  .  ik 

f(  =  )  fi!c 


f '  f  ' 

.>  ,  m  f  lScl 

f  ik  (c)tiJc  f(c) 


f ' 

“>  f  ( c )  1  ik  "  f(c)iik7^J^ 


A  parallel  argument  shows  that  (4.1)  implies  the  definition  of  DR. 


Lemmas  1  and  2  are  alternative  definitions  of  the  class  DR.  A 
sufficient  condition  for  there  to  exist  a  transformation  in  DR  that 
transforms  f^  into  f'ik  j*  fik  is  that  Xik  and  X  -  X  are  independent. 


Transformations  in  the  AI  class  are  characterized  by  altering  the 
argument  set  of  the  function  h.  In  practice,  members  of  AI  most  often 
recruit  additional  outputs,  thus  providing  more  information  for 
estimation  of  8.  Restricting  attention  to  those  transformations  in  AI 
that  augment  the  original  argument  set,  it  is  possible  to  identify  two 
important  subclasses:  those  that  yield  more  information  about  0  (Al.e), 
and  those  that  recruit  outputs  containing  information  about  Z  (AI.Z). 

In  Chapter  3  the  concept  of  information  about  a  parameter  contained 
in  a  random  variable  was  discussed.  The  idea  can  be  extended  quite 
naturally  to  information  contained  in  one  random  variable  about 
realizations  of  another.  In  particular,  if  the  two  random  variables  are 
independent,  then  looking  at  the  realization  of  one  reveals  nothing 
about  the  other.  However,  if  the  two  are  statistically  dependent,  then 
the  realization  of  one  may  reveal  something  about  the  likelihood  of  the 
particular  realization  of  the  other.  Thus,  uncertainty  about  the 
likelihood  of  the  realization  of  a  statistic  may  be  reduced,  and  the 
estimate  may  be  modified  based  on  this  knowledge.  In  Chapter  5  h  VRT 


(control  variates) 

that 

makes 

use 

of  this 

type  of  information  is 

discussed.  Note 

that 

AI.  0 

and 

AI.Z  need 

not  be  disjoint,  but  that 

members  of  AI  -  AI. 

0  - 

AI.Z 

will 

never  be 

effective  for  reducing 

variance. 

The  Transformations  and  Information 

Throughout  the  development  of  this  research,  the  concept  of' 
statistical  information  and  its  usefulness  in  the  discussion  of  variance 


reduction  has  been  stressed.  The  six  classes  of  transformations  are  a 
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particularly  useful  breakdown  of  the  possible  transformations  because  of 
the  close  association  of  each  class  with  either  the  idea  of  increasing 
the  available  information  or  making  better  use  of  information.  As  noted 
briefly  above,  DR,  DI,  and  SA  transformations  can  increase  information 
in  the  simulation  experiment.  This  is  because  f  determines  the 
information  content  of  the  initial  sample  (X)  and  R#  controls  how  the 
sampling  effort  is  allocated.  On  the  other  hand,  g  and  h  determine  the 
information  loss  via  transformation  of  the  intial  sample. 
Transformations  in  the  EA,  El,  and  AI  classes  can  reduce  the  loss. 

It  should  be  noted  that  effective  members  of  the  El  class  of 
transformations  have  been  extensively  studied  in  the  classical 
statistical  literature  on  optimality  of  estimators.  Concepts  such  as 
sufficiency  and  minimum  variance  estimators  are  results  of  this  work. 
Estimators  using  auxiliary  information  have  also  been  studied  in  the 


same  manner. 


FIVE  WELL-KNOWN  VRTS 


In  this  chapter  five  of  the  more  well-known  VRTs  are  reviewed  in 
light  of  the  framework  developed  in  Chapters  3  and  4.  The  VRTa 
considered  are  antithetic  variates,  common  random  numbers,  control 
variates,  stratified  sampling,  and  use  of  conditional  expectations.  For 
each  a  brief  review  of  the  literature  is  presented,  along  with  a 
description  of  the  VRT  and  graphical  display  of  how  the  VRT  is  composed 
of  members  of  the  six  classes  of  transformations.  The  purpose  is  not  to 
propose  a  precise  definition  of  these  five  VRTs,  since  the  same  names 
are  used  for  several  variations.  Rather,  an  attempt  is  made  to  present 
the  most  widely  accepted  version  of  each  technique.  First,  a  symbol  set 
is  given  to  be  used  for  graphical  presentation. 


Symbol  Set 

Only  three  basic  symbols  are  needed  (see  Figure  5«l).  A  rectangle 
will  enclose  a  definition  of  an  input,  output,  or  statistic  in  the 
simulation  experiment.  A  circle  denotes  a  class  of  transformations,  and 
a  trapezoid  some  prior  knowledge  used  to  make  the  application  of  the 
transformation  reasonable.  The  progression  is  from  left  to  right, 
proceeding  from  a  definition  of  some  input,  output,  or  statistic  to  a 


new  definition  via  a  transformation. 


definition  of  random  variables 


class  of  transformations 


prior  knowledge 


progression 


Figure  5-1  Symbol  Set  for  VBTs 


Recall  that  VRTs  are  composed  of  members  of  the  six  classes  of 
transformations  combined  with  prior  knowledge;  given  a  specific  problem 
they  can  be  implemented  as  algorithms.  The  presentation  in  this  chapter 
is  on  the  VST  level. 

Antithetic  Variates  (  AV) 

Antithetic  Variates  is  a  VRT  that  has  been  extensively  studied  in 
the  context  of  Monte  Carlo  estimation.  The  technique  was  invented  by 
Hammersley  and  Morton  (1956),  with  further  developments  by  Hammersley 
and  Mauldon  (1956),  Morton  (1956),  Halton  and  Handscomb  (1957),  and 
Handscomb  (1958).  In  its  broadest  sense,  "we  use  the  term  antithetic 
variates  to  describe  any  set  of  estimators  which  mutually  compensate 
each  other's  variations."  (Hammersley  and  Handscomb,  1964,  p.  61 ) 
Statistical  results  such  as 

Var(Y  *Y.)  -  Var(Y.)  ♦  Var(Y.)  ♦  2Cov(Y  ,Y  .)  (5.1 ) 

i  J  A  J  1  J 

make  clear  the  advantage  of  random  variables  being  correlated. 
Correlation  may  be  inherent  in  the  outputs  of  a  simulation  experiment. 
However,  AV  attempts  to  force  a  correlation  structure  onto  the 
observations  while  preserving  their  marginal  distributions.  The 
correlation  is  usually  accomplished  by  making  them  analytically 
dependent.  When  an  estimator  consists  of  a  sum  of  n  random 
observations,  the  antithetic-variates  theorem  (Hammersley  and  Mauldon, 
1956,  Handscomb,  1958,  Wilson,  1979,  1982a,  1983c)  shows  that  under 
fairly  general  conditions  the  greatest  lower  bound  of  the  variance  of 
the  estimator  can  be  approached  arbitrarily  closely  by  generating  all  n 


observations  from  a  deterministic  transformation  of  one  random 
observation.  Texts  that  discuss  AV  in  both  the  Monte  Carlo  and 
simulation  contexts  include:  Tocher  (1963),  Hammersley  and  Handscomb 
(1964),  Shreider  (1964,  1966),  Mize  and  Cox  (1968),  Meier,  Newell  and 
Pazer  (1969),  Fishman  (1973,  1978),  Gaver  and  Thompson  (1973),  Hillier 
and  Lieberman  (1974),  Kleijnen  (1974),  Carter  and  Cashwell( 1 975) , 
Yakowitz  (1977),  Pritsker  and  Pegden  (1979),  Rubinstein  (1981),  Kohlas 
(1982),  Law  and  Kelton  (1982),  and  Payne  (1982).  Research  into  the 
application  of  AV  in  the  simulation  of  stochastic  networks  has  been  done 
by  Burt,  Gaver,  and  Perlas  (1970),  Burt  and  Garman  (1971  a,  1971b), 
Sullivan,  Hayya,  and  Schual  (1982),  Carson  (1983),  Grant  (1983,  1980), 
and  Grant  and  Solberg  (1983);  Kumamoto,  Tanaka,  Inoue  and  Henley  ( 1980a) 
investigate  Monte  Carlo  evaluation  of  fault  trees.  Moy  (1965,  1971), 
Page  (1965),  Gaver  (1969),  and  Mitchell  (1973)  use  AV  for  simulating 
queuing  systems.  George  (1977)  looks  at  simulating  replacement  processes 
with  AV,  while  Fishman  (1981,  1982a,  1982b,  1963a)  deals  with  simulation 
of  Markov  chains  and  processes.  Issues  relating  to  the  generation  of 
antithetic  observations  in  various  contexts  are  discussed  in  Fishman 
(1972,  1974),  Franta  (1975),  and  Cher.g  (1981,  1982,  1983a,  1983b). 
Combining  AV  and  other  VRTs  is  examined  in  Fishman  (1974),  Gentle 
(1975),  Kleijnen  (1974,  1975);  Schruben  (1978,  1979),  Schruben  and 
Margolin  (1978),  and  Cooley  and  Houck  (1982)  incorporate  AV  and  CRN  (see 
below)  into  the  design  of  simulation  experiments.  Roach  and  Wright 
(1977)  correctly  state  that  systematic  sampling  (SYS)  plans  are  a  subset 
of  general  AV  sampling  plans;  see  Madow  and  Madow  (1944)  for  a  reference 
on  SYS.  Other  papers  of  interest  are  Deutsch  and  Schmeiser  (1980), 


Fishman  (1979,  1968),  McGrath  and  Irving  (1973a,  1974),  Simon  (1976), 

Lavenberg  and  Welch  (1978),  Halton  (1979),  Rubinstein  and  Samordnitsky 
(1982),  Rubinstein,  Samordnitsky,  and  Shaked  (1982),  and  Wilson  (1982, 
1983a,  1983b,  1983d). 

Consider  estimating  01  using  a  simulation  experiment  defining 

1"-2 . Ii'2n 


where  EtY.^ ) 


0^ ,  with  statistic 


—  £  Y.. 
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Further,  suppose  that 


X(ii)  1.1. d. 


(5.2) 


The  usual  AV  transformation  is  to  redefine  the  joint  distribution  of 


(X(2i-1  ,1;  ’  X(2i,l))  1-1 

such  that  they  still  have  the  marginals  given  in  (5.2),  but  the  pairs 
are  negatively  correlated  in  some  way.  When  X^  ^  is  a  scalar,  or  if  AV 
is  only  used  on  a  scalar  component  of  ,  the  correlation  is  most 

often  induced  by  generating  realizations  via  the  inverse  cumulative 
distribution  function  (cdf)  of  X^^  in  the  following  manner: 

X( 2 i— 1 , 1 )  *  F’1(l)(Ui) 
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X(2i,l)  3  F~1(l)(l~Ui) 

where  IL  “  U(0,l)  i  *  When  X^j  is  multivariate,  there  are 
a  variety  of  approaches  and  objectives.  The  reason  for  redefining  the 
inputs  to  be  dependent  is  to  cause 


Cov(Y 


2i-1  ,1 


)  <  0 


reducing  the  variance  of  via  (5*1 )•  Figure  5-2  shows  how  AV  employs 
the  DI  class  of  transformations. 


Common  Random  Numbers  (CRN) 

Common  random  numbers  is  often  called  "correlated  sampling"  (CS). 
There  is  some  confusion  because  CRN  is  both  a  method  for  generating 
correlated  samples  and  a  VRT  that  exploits  induced  correlation.  "The 
name  of  the  technique  stems  from  the  possibility  in  some  situations  of 
using  the  same  stream  of  basic  U(0,l)  random  variables  to  drive  each  of 
the  alternative  models  through  time..."  (Law  and  Kelton,  1982,  p.350). 
Here,  the  term  CRN  is  used  in  the  sense  of  CS,  meaning  that  correlation 
is  induced  (by  whatever  means)  between  certain  inputs  to  obtain 
positively  correlated  outputs,  and  the  interest  is  in  estimating  a 
parameter  that  can  be  expressed  as  a  difference. 

CRN  has  the  distinction  of  being  "...the  only  VRT  that  is  as  a  rule 
used  by  practitioners  of  simulation"  (Kleijnen,  1974,  p.  206).  Papers 
and  books  discussing  various  aspects  of  CRN  include  Kahn  and  Marshall. 
(1953),  Jessop  (1956),  Conway  (1963),  Fishman  (1968,  1974),  Ignall 
(1972),  McGrath  and  Irving  (1973a,  1 97 4  9,  Becker  (1974),  Kleijnen  (1974, 
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75),  Lavenberg  and  Welch  (1978),  Heidelberger  and  Iglehart  (1978), 
itaker  and  Pegden  (1979),  Rubinstein  (1981),  Gal,  Rubinstein,  and  Ziv 
981),  Wilson  (1982a,  1982b,  1983a,  1983b),  Law  and  Kelton  (1982), 
hlas  (1982),  Banks  and  Carson  (1983),  and  Bratley,  Fox  and  Schrage 
983).  Gentle  (1975)  calls  the  technique  control  variates.  Mihram 
974),  Heikes,  Montgomery,  and  Rardin  (1976),  Schruben  (1978,  1979), 
hruben  and  Margolin  (1978),  and  Cooley  and  Houck  (1982)  investigate 
icorporating  CRN  into  the  design  of  the  simulation  experiment  as  a 
indom  block  effect.  The  last  three  papers  consider  incorporating  AV 
uth  CRN,  as  do  Fishman  (1974)  and  Kleijnen  (1974,  1975).  Wright  and 
uasey  ( 1 97 9 )  give  a  simple  example  of  an  inventory  simulation  where 
jing  CRN  gives  counterintuitive  results. 

Consider  estimating 


t  relation 


Var(Y2  -  Y5)  -  Var(Y2)  +  VarCY^)  -  2Cov(Y2,Y3) 

idefining  the  joint  distribution  of  (*(i2)  *  X^^))  i  *  1,2,...,I^ 
that  they  are  positively  correlated — without  redefining  their 
Lnal  distributions — it  is  hoped  that  Cov(Y2,Yj)  >  0,  reducing  the 
ince  of  .  Several  of  the  references  cited  discuss  conditions  and 
)ds  that  insure  a  favorable  covariance  term.  See  Figure  5.3  for  a 
lical  presentation  of  this  VRT. 


rol  Variates  (CV) 

The  term  control  variates  has  a  variety  of  meanings.  Here,  it  will 
used  to  describe  a  class  of  statistics  that  attempt  to  correct  the 
e  of  an  estimator  based  on  the  discrepancy  between  the  value  of  a 
nd  estimator  and  the  known  value  of  its  expectation.  For  example, 


Y^  j  and  Y^  be  sets  of  output  random  variables  in  a  simulation 


riment,  and  s1  and  s2  be  scalar  valued  functions  such  that 


e[s1 (Y(1 ))]  -  91  E[s2(Y(2))J  -  a 


e  0.|  and  a  are  real  scalars;  8^  is  the  parameter  of  interest  and  a 
nown.  The  two  most  common  CV  estimators  are  the  linear  control 


z=  ■  •,»(.)>  -  k<32‘Y(2)>  -  ■> 


(5.3) 


e  b  is  a  constant,  and  the  ratio  estimator 
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3iCYd  )J 

w 


(5.4) 


ction  a  ia  the  control  variate.  Both  (5.3)  and  (5.4)  extend 
ly  to  multiple  control  variates,  but  that  extension  will  not  be 
red  here.  Also  not  discussed  is  the  determination  of  the 
er  b,  except  to  cite  several  references.  In  the  simulation 
ure,  a  distinction  is  made  between  "internal"  control  variates 
variables  that  are  part  of  the  same  real  or  conceptual  system) 
ternal"  control  variates  (random  variables  that  are  part  of  a 
real  or  conceptual  system).  This  distinction  is  not  relevant 
dnce  both  types  are  simply  functions  of  outputs  in  the  simulation 
tent.  However,  external  control  variates  employ  a  transformation 
le  class  DI,  while  internal  CV  does  not. 

ixtbooks  providing  general  descriptions  of  CV  for  Monte  Carlo 
;ion  are  Hammersley  and  Handscomb  (1964),  Shreider  (1964,  1966), 
:z  (1977),  and  Rubinstein  (1981 ).  Concentrating  more  specifically 
lulation  are  Tocher  (1963),  Fishman  (1973,  1978),  Gaver  and 

>n  (1973),  Kleijnen  (1974),  Pritsker  and  Pegden  (1979),  Law  and 
(1982),  and  Bratley,  Fox,  and  Schrage  (1985);  see  Cochran  (1977) 
icussion  of  CV  in  survey  sampling.  Use  of  CV  in  the  simulation  of 
itic  networks  is  investigated  in  Burt,  Gaver  and  Perlas  (1970), 
id  German  (1971),  Grant  (1980)  and  Grant  and  Solberg  (1983),  while 
;o,  Tanaka,  and  Inoue  (1977)  apply  CV  to  fault  tree  analysis. 
>rg,  Moeller,  and  Welch  ( 1977a,  1977b,  1978,  1982),  Taaffe  and 

,1983),  Wilson  (1979b,  1983e,  1983f),  Wilson  and  Pritsker  (1982) 

■  th  CV  in  the  simulation  of  queuing  .lystems.  The  selection  or 
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n  of  the  CV  multipliers  under  various  assumptions  for  linear 
like  (5.3)  has  received  considerable  attention;  see  for 
Cheng  (1978),  Hopmans  and  Kleijnen  (1980)  and  Koehler  (1981). 
ind  Beale  (1983)  discuss  verification  of  the  hypothesis  that  the 
ihip  with  the  CV  is  indeed  linear.  Olkin  (1958),  Matern  (1962), 
977),  and  Isaki  (1983)  treat  CV  in  the  survey  sampling  context, 
(1956),  Swain  (1981,  1982),  and  Swain  and  Schmeiser  (1983) 
ite  on  Monte  Carlo  problems.  Iglehart  and  Lewis  (1979)  consider 
the  general  context  of  regenerative  simulations.  Initial 
Lons  of  CV  in  simulations  used  a  single,  linear  control.  In  an 
t  survey  paper,  Lavenberg  and  Welch  (1981)  summarize  results  on 
lltiple  linear  control  variates.  Rubinstein  and  Markus  (1982) 
iese  results  to  estimation  of  multiple  parameters  with  multiple 
,  and  Nozari,  Arnold,  and  Pegden  ( 1 933)  extend  them  to 
ilation  simulation  experiments.  Other  papers  of  interest  are: 
i  and  Ben  Tuvia  (1962),  Moy  (1965,  1971),  Gaver  (1969),  Gaver 
Ler  (1971),  McGrath  and  Irving  (1973a,  1974),  Gentle  (1975), 

5  and  Welch  (1978),  Cheng  and  Feast  (1980)  and  Wilson  (1982b, 
383  b) . 

are  describing  the  general  characterization  of  CV  an  expression 
variance  of  a  function  of  two  random  variables  will  be  derived, 
nd  be  two  real,  scalar  valued  random  variables  with  finite 
Let 

S[Q.]  -  y.  i  =  1  ,2 

a  function  h(Q1  ,Q2)  that  is  analytic  at  (y^t^)  for  all  values 
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of  Qj  and  Q2*  Then  to  two  terms,  the  Taylor  series  expansion  of  h  about 
this  point  is 

2  3h(Y.,,Y5) 

h(Q1,Q2)  -  h(Yl,Y2)  +  ^  - -  (Q±  -  vt)  *  r2  (5.5) 

where  R2  is  a  remainder  term  given  by  the  next  term  in  the  Taylor 
expansion  with  h  evaluated  at  an  unknown  point  between  y^  and  i  -  1 ,2 
for  Q1  and  Q^,  respectively.  Ignore  this  term  for  the  moment.  Using 
e[q±]  ■  Yi»  (5.5)  implies  that 

E[h(Q1 ,Q2)J  “  h(Yt ,Y2) 

and 

E[h2(QJ,Q2)]  -  h2(Y?,Y2)  ♦' 


2  2 
£  £ 
i-1  j-1 


3h( Yj ,Y2 


)  3h(Y,  ,Yo) 


Cov^Qi,Qj-* 


Combining  these  two  gives 


Var[h(Q  ,Q  )]  «  £  £ 

i-1  j-1 


2  3h(Y1fY2)  3h(Y1 ,Y2) 


Cov[Q.,Qjj 


(5.6) 


Interestingly,  to  make  (5.6)  an  equality  only  requires  adding  the  term 
Var(R2)  to  the  right  hand  side. 

Now,  returning  to  the  description  given  at  the  beginning  of  this 
section,  the  CV  estimator  will  be  of  the  form 
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Zc  *  h(VI(l))'82(l(2))) 


with  the  restriction  that  h( 9 ^ ,o)  »  0^  Assume  that  h  is  analytic.  It 
is  clear  that  (5.3)  and  (5.4)  are  of  this  general  form.  Several  authors 
have  noted  that  these  two  estimators  are  in  the  same  general  class, 
including  Kleijnen  (1974)  and  Olkin  (1983). 


Result: 


Var(z  )  * 

c 


where  all  the  partial  derivatives  are  evaluated  at  (e^,a). 

Proof:  In  (5.6),  identify  s^  with  Q^,  with  y^  ,  and  a  with  y 2< 


Note  that  for  the  linear  control,  R2  »  0.  The  result  shows  that 
nonzero  covariance  between  the  estimator  of  the  parameter  of  interest 
and  the  control  variate  is  usually  necessary  for  CV  to  be  effective. 
The  estimator  s^  contains  information  about  s^ ,  in  the  sense  that 
uncertainty  about  the  expected  value  of  s^  is  reduced  by  knowledge  of 
Bg.  The  covariance  term  represents  this  information.  See  Figure  5.4 
for  a  description  of  how  CV  combines  transformations  from  the  El  and  AI 
classes. 


Stratified 


(STRAT) 


Books  discussing  stratified  sampling  in  Monte  Carlo  problems 
include  Hammersley  and  Handscomb  (1964),  Shreider  (1964,  1966)  and 
Rubinstein  (1981).  In  the  context  of  survey  sampling,  see  Cochran 


inpuus 


Statistics 


Z1  *  hl^Y(l)^ 

*  Sl(Yd)} 
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(1977).  Some  books  or  chapters  dealing  with  stratified  sampling 
specifically  in  systems  simulation  are  Moy  (1971 ),  Kleijnen  (1974), 
Hillier  and  Lieberman  (1974),  Pritsker  and  Pegden  (1979),  Payne  (1982), 
and  Bratley,  Pox  and  Schrage  (1983).  Methods  for  setting  stata 
boundaries  are  examined  in  Delanius  (1950),  Delanius  and  Gurney  (1951), 
Sethi  (1963),  and  Singh  (1975a,  1975b,  1977).  Papers  of  general 
interest  in  simulation,  Monte  Carlo,  and  survey  sampling  contexts 
include  Ehrenfeld  and  Ben-Tuvia  (1962),  Moy  (1965),  Sardnal  (1968),  Burt 
and  Garman  (1971  a),  Bayes  (1972),  McGrath  and  Irving  (1973a,  1974), 
Gentle  (1975),  Hartley  (1977),  and  Wilson  (1979b,  1982b,  1983a,  1983b). 
Kahn  (1950a,  1950b)  and  Steinberg  (1963)  discuss  it  under  the  name 
"quota  sampling."  For  interesting  application  papers  see  Surkis,  Gordon, 
and  Hynes  (1975),  Gordon  and  Hynes  (1978),  and  Diegert  and  Diegert 
(1981).  DeGroot  and  Starr  (1969)  look  at  the  problem  from  a  Bayesian 
viewpoint;  the  stratum  means  and  proportion  of  the  population  in  each 
stratum  have  prior  distributions. 

Consider  estimating  0^  when  it  is  possible  to  sample  I1 
observations  of  Y^  ,  where  E(Y^ )  *  0^,  i  *  1,2,...,I^.  Thus,  the  crude 
estimator  of  might  be 
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some  fixed  column  index  k  of  X.  For  convenience  write 


(5.7) 


suppressing  the  Assume  that  are  i.i.d.  random  variables  for 

all  i,  and  that  the  range  of  X  can  be  divided  into  n  nonoverlapping, 
exhaustive  intervals  (strata).  Denote  these  strata  by 
L.  ,  j  «  2,...,n+1.  An  equivalent  way  to  view  (5.7)  is 

J 

*11  '  «i3(V  . . i‘1 . 

such  that  Y^  is  the  ith  observation  of  Y1  when  6  Lj ,  and 


1  “  X1 
j=2  J  1 


Now,  if  p.  »  P(X  €  L.)  is  known  for  all  j,  and  if  the  values  of 
J  ^  J 

I.  ,  j  *  2,...#n+1  can  be  fixed  arbitrarily,  then  the  STRAT  estimator 
J 

z\  ■  h'i(i(o) 


n+1  i 
Z  p .  J  I  z  Y^ 

j-2  J  j  i-1  J 


may  have  smaller  variance  that  Z. ,  depending  on  the  allocation  I*  .. 

1  J 

Allocation  strategies  will  not  be  discussed  here  (see  for  instance, 

Cochran,  1977),  but  if  proportional  allocation  (I.  -  Ip.)  is  used  then' 

J  I  J 
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Var(z‘  ^ )  <  Var(Z1 ) 

(Rubinstein,  1931).  Note  that  if  I .  is  not  altered  then  the  VRT  is 

0 

known  as  poststratified  sampling.  See  Figure  5.5. 


Conditional  Expectations  (CE) 

Conditional  expectations  is  often  called  "conditional  Monte  Carlo" 
(CMC).  However,  CMC  is  a  sampling  technique  originally  developed  by 
Trotter  and  Tukey  (1956)  to  "use  a  family  of  transformations  to  convert 
given  samples  into  samples  conditioned  on  a  given  characteristic  (p. 
64)."  The  original  CMC  was  not  inherently  a  variance  reduction 
technique,  although  when  used  as  one  it  is  most  closely  akin  to 
importance  sampling  (Dubi  and  Horowitz,  1979).  Here,  the  term 
conditional  Monte  Carlo  is  reserved  for  the  original  sampling  technique. 
Other  references  include  Arnold,  Bucher,  Trotter,  and  Tukey  (1956), 
Hammersley  (1956),  Wendel  (1957),  Hammersley  and  Handscomb  (1964), 
Granovsky  (1981 ),  Hubinstein  (1981),  and  Wilson  (1983b). 

The  use  of  conditional  expectations  (CE)  will  be  described  as  the 
term  is  used  by  Law  and  Kelton  (1982).  Fishman  (1973)  and  Pritsker  and 
Pegden  (1979)  refer  to  it  as  use  of  "prior  information";  Carter  and 
Ignall  (1970,  1975)  use  the  term  "virtual  measures."  Brown,  Solomon,  and 
Stephens  (1979)  use  CE  for  estimating  the  expected  number  of  renewals  in 
[0,tj  for  a  renewal  process,  while  Andrews,  Bickel,  Hampel,  Huber, 
Rogers  and  Tukey  (1972)  employ  conditioning  in  Monte  Carlo  estimation  o.f 
location  parameters.  Lavenberg  and  Welch  (1979)  surveys  applications  of 
CE.  Other  papers  of  interest  are  Kahn  (1950,  1956),  Kahn  and  Marshall 


(1953),  McGrath  and  Irving  (1973a,  1974),  Gross  (1973),  Simon  (1976), 
Lavenberg  and  Welch  (1978),  and  Wilson  (1982,  1983a,  1983b).  The  latter 
paper  by  Wilson  compares  CMC  and  CE.  Fox  (1983)  establishes  conditions 
that  guarantee  effectiveness  of  CE  when  based  on  correlated  outputs;  see 
also  Bratley,  Fox  and  Schrage  (1983). 

Consider  estimating  8^  using  a  simulation  experiment  defining 


gi1(X(il)} 


1  , 2  , . . . ,  I , 


However,  suppose  there  is  another  output  random  variable 
7^2  ^  "*  1 ,2,  •  •  •  flg 

*[*1  *12  ’ 

can  be  calculated  for  all  realizations  y^2  of  Y^.  Here  Y^  is  generic 
for  any  of  Y^ .  Based  on  the  well-known  relation 

VarCELY,  !Yi2J]  -  Varfy  ]  -  EtVartY, iY12]] 

use  the  estimator 


h',(Y(i).) 


L2  1-1 


,£. 


(5.9) 


The  estimator  (5.9)  is  unbiased  for  8^ ,  and  if  the  Y^  are  independent 
then  it  has  no  greater  variance  than  (5.8)  if  a  10.  See  Fox  (1983) 


for  more  general  conditions.  However,  CE  is  often  employed  when 
I  >  I  such  as  when  Y.1  are  results  of  "rare  events."  Clearly  the 
estimator  (5.9)  may  be  based  on  a  vector  of  outputs,  not  ju3t  a  scalar 
Y-2.  Note  that  Y^  has  not  been  redefined,  but  rather  other  outputs  in 
the  simulation  experiment  are  used.  CE  is  shown  in  Figure  5.6. 


Inputs 


X  %  f  (x) 
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Outputs 
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CONCLUSIONS 


Although  the  research  presented  here  may  seem  remote  from  actual 
problems,  it  should  have  ramifications  on  practical  simulation 
experiments.  Conversations  with  practitioners  indicate  that,  with  the 
exception  of  the  most  simple  applications  of  AV  and  CRN,  variance 
reduction  techniques  are  seldom  employed.  This  is  due  partly  to  a  lack 
of  knowledge  and  understanding;  to  the  casual  student  of  simulation, 
variance  reduction  appears  to  be  a  collection  of  special  purpose 
techniques  that  need  to  be  rederived  for  each  application.  The 
existence  of  an  underlying  theory  and  a  small  number  of  elemental 
transformations  provides  structure  not  previously  available.  This 
structure  should  facilitate  coherent  instruction  in  variance  reduction 
and  also  provide  common  ground  for  reporting  applications  that  take 
place.  The  distinction  between  transformations,  VRTs,  and  algorithms  is 
central:  1)  The  six  classes  generate  all  of  variance  reduction,  in  a 
sense  providing  a  checklist  of  possible  approaches  to  take  based  on  what 
prior  knowledge  is  available.  2)  Graphical  presentation  of  VRTs 
(Chapter  5)  provides  a  clear  description  of  general  VRTs,  what  knowledge 
is  commonly  needed  to  "make  them  go,"  and  yet  does  not  relate  them  to  a 
particular  application.  3)  Algorithms,  the  problem  specific  part,  do  not 
seem  as  ad  hoc  when  they  are  just  examples  of  general  methods  and  even 
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more  basic  transformations.  The  definition  of  simulation  experiments 
(Chapter  3)  not  only  provides  the  structure  needed  to  prove  the  results 
of  Chapter  4,  but  also  structures  the  thinking  of  the  experimenter, 
helping  him  to  recognize  prior  knowledge  that  can  be  exploited. 

The  research  presented  here  depends  heavily  on  the  usefulness  and 
validity  of  the  definition  of  simulation  experiments;  this  definition  is 
consistent  with  the  general  idea  of  an  experiment  in  probability  and 
statistics,  and  appears  to  be  broad  enough  to  encompass  Monte  Carlo 
experiments  and  survey  sampling.  We  suspect  that  the  characterization 
covers  any  "sampling  experiment,"  but  do  not  know  how  to  prove  such  a 
conjecture.  The  terms  and  definitions  used  are  well-known  statistical 
objects:  sample  spaces,  events,  probability  distributions,  and 
transformations.  The  formal  definition  permits  investigation  of  issues 
such  as  experimental  design,  restrictions  on  sampling,  efficient 
estimation  techniques,  and  the  trade  off  between  variance  and  bias.  An 
unexpected  bonus  is  that  numerical  techniques  and  analytic  solutions  are 
special  cases.  Finally,  the  axioms  are  few  and  reasonable:  the 
experiment  is  relevant  to  the  estimation  problem,  and  the  experiment  can 
be  performed. 

Other  approaches  could  have  been  taken.  The  definition  ignores 
issues  of  model  validity,  implementation  of  computer  algorithms,  and 
numerical  limitations  of  the  computer.  It  is  often  useful  (as  a 
modelling  perspective)  to  view  a  computer  simulation  as  a  stochastic 
process,  but  our  definition  does  not  do  30.  In  variance  reduction  we. 
are  concerned  with  statistical  properties  of  estimators  in  a  sampling 
experiment,  and  we  capture  that  aspect. 
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Recent  research  in  the  area  of  variance  reduction  has  emphasized 
the  application  of  VRTs  in  specific,  but  hopefully  broad  classes  of 
problems.  Attempts  have  been  made  to  specify  rather  veak  conditions 
under  which  a  variance  reduction  i3  guaranteed.  Ve  hope  that  our 
research  will  accelerate  the  effort,  making  it  possible  to  derive  even 
more  general  conditions  for  even  broader  classes  of  problems.  The 
ultimate  achievement,  which  is  probably  not  possible,  would  be  necessary 
and  sufficient  conditions  under  which  application  of  each  class  of 
transformations  would  achieve  a  variance  reduction. 

As  mentioned  above,  the  definition  of  simulation  is  quite  broad  in 
scope.  Certainly  the  inclusion  of  survey  sampling  should  be  studied 
more  thoroughly.  By  having  such  a  general  model  of  sampling 
experiments,  defining  the  differences  between  particular  cases  such  as 
"systems  simulation,"  Monte  Carlo,  and  survey  sampling  is  possible.  For 
instance,  McGrath  and  Irving  ( 1973a)  stated  that  simulation  experiments 
can  be  viewed  as  Monte  Carlo  estimation  problems  like  (2.2).  The 
definition  reveals  why  this  is  sometimes  difficult.  In  systems 
simulation  the  integral  might  be  of  the  form: 

0  -  J  g(x)f(x|e)dx 
A 

where  0  is  a  function  of  g(x). 

To  attack  the  problem  of  determining  necessary  and  sufficient 
conditions  for  a  transformation  to  be  effective,  the  theory  of 
statistical  information  might  be  a  key  tool.  An  investigation  of  the 
types  and  value  of  different  kinds  of  auxiliary  information  is  in  order. 
The  result  in  Chapter  5  demonstrates  that  linear  correlation  can  be 
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With  the  expanding  use  of  computer  simulation  to  model  and  solve 
industrial  engineering  problems,  there  has  been  increasing  interest  in  the 
development  of  efficient  simulation  techniques.  When  the  concern  if  for 
statistical  efficiency  of  results  that  are  random  variables,  such  approaches 
are  usually  called  variance  re-'act ion  techniques  (VRTs). 

Many  of  the  fundamental  id  as  in  simulation,  and  particularly  techniques 
for  efficient  simulation,  had  their  origins  in  the  Monte  Carlo  estimation 
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literature.  The  theory  of  sampling  is  another  closely  related  field  that 
predates  the  development  of  simulation.  Although  there  has  been  significant 
research  interest  in  variance  reduction,  there  have  been  few  attempts  to 
structure  and  define  the  discipline. 

VRTs  are  transformations.  They  transform  simulation  experiments  into 
related  experiments  that  yield  better  estimates  of  some  parameters  of 
interest,  where  better  usually  means  more  precise.  This  research  identifies 
and  defines  the  components  from  which  all  variance  reduction  techniques  are 
built.  Given  a  general  mathematical-statistical  definition  of  simulation 
experiments,  these  components  or  classes  of  transformations  are  shown  to  be 
useful,  to  be  mutually  exclusive,  and  to  generate  all  possible  VRTs  via 
composition.  Benefits  of  the  research  include:  1)  the  facility  to  decompose 
VRTs  into  combinations  of  transformations,  making  the  relationships  between 
VRTs  clear,  2)  the  facility  to  unambiguously  define  new  or  existing  VRTs, 
eliminating  confusion  that  currently  exists  in  the  literature,  3)  the 
development  of  a  theoretical  foundation  for  analytical  treatment  of  VRTs, 
and  4)  the  development  of  a  setting  for  proposing  new  VRTs  and  research 
questions.  In  addition,  increased  understanding  of  the  area  should  promote 
more  and  better  application  of  variance  reduction  in  practice. 
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